andy
5a94c5ff1a
Affected files: STEM/AI/Kalman Filter.md STEM/Signal Proc/Convolution.md STEM/Signal Proc/Image/Tracking.md STEM/Signal Proc/Pole-Zero.md STEM/Signal Proc/Transfer Function.md STEM/Speech/Linguistics/Consonants.md STEM/Speech/Linguistics/Linguistics.md STEM/Speech/Linguistics/README.md STEM/Speech/Linguistics/Terms.md STEM/Speech/Linguistics/Vowels.md STEM/Speech/NLP/Jargon.md STEM/Speech/NLP/NLP.md STEM/Speech/NLP/README.md STEM/Speech/NLP/Recognition.md STEM/Speech/Perception/Perception.md STEM/Speech/Perception/README.md STEM/Speech/Speech Processing/Applications.md STEM/Speech/Speech Processing/README.md STEM/Speech/Speech Processing/Source-Filter.md STEM/Speech/Speech Processing/Vocal Tract.md STEM/img/english-phoneme-table.png STEM/img/formant.png STEM/img/pole-zero-attenuation.png STEM/img/pole-zero-feedback.png STEM/img/pole-zero-stable.png STEM/img/roc-right-left.png STEM/img/roc-two-sided.png STEM/img/spectrum-vocal-tract.png STEM/img/transfer-stable-unstable.png STEM/img/vowel-chart.png STEM/img/vowel-spaces.png
58 lines
1.3 KiB
Markdown
58 lines
1.3 KiB
Markdown
1. Automatic Speech Recognition
|
|
- Spoken words to machine-readable form
|
|
2. Natural language understanding
|
|
- High level cognitive interpretation
|
|
- Structure
|
|
- Meaning
|
|
- Intention
|
|
|
|
# Automatic Speech Recognition
|
|
## Applications
|
|
- Business/desktop apps
|
|
- Dictation
|
|
- Voice commands
|
|
- Voice enabled services/apps
|
|
- Siri
|
|
- Home automation
|
|
- Game & Entertainment
|
|
- Education
|
|
- Speech therapy/Rehab
|
|
- Hearing assistance
|
|
- Live CC
|
|
|
|
## Challenges
|
|
- Speaker dependency
|
|
- Accent
|
|
- Emotion
|
|
- Vocab size
|
|
- Slang
|
|
- Isolated words vs Continuous speech
|
|
- Hard to segment continuous speech
|
|
- Language constraints & Knowledge sources
|
|
- Training source is critical
|
|
- Acoustic ambiguity
|
|
- Similar sounding speech
|
|
- Noise robustness
|
|
- Background noise
|
|
- Reverberation
|
|
|
|
# Speech Diarisation
|
|
- Who speaks when?
|
|
- Split stream into homogenous segments for identity
|
|
- Structure stream into speaker turns
|
|
- Provide speaker identity
|
|
- Combination of
|
|
- Speaker segmentation
|
|
- Speaker changes in stream
|
|
- Speaker clustering
|
|
- Grouping segments together on basis of characteristics
|
|
- Gaussian mixture model
|
|
- HMM
|
|
- Bottom-up
|
|
- More popular
|
|
- Succession of clusters
|
|
- Merge redundant clusters
|
|
- Remaining belong to speakers
|
|
- Top-down
|
|
- Single cluster
|
|
- Iteratively split until speaker clusters |