andy
5a94c5ff1a
Affected files: STEM/AI/Kalman Filter.md STEM/Signal Proc/Convolution.md STEM/Signal Proc/Image/Tracking.md STEM/Signal Proc/Pole-Zero.md STEM/Signal Proc/Transfer Function.md STEM/Speech/Linguistics/Consonants.md STEM/Speech/Linguistics/Linguistics.md STEM/Speech/Linguistics/README.md STEM/Speech/Linguistics/Terms.md STEM/Speech/Linguistics/Vowels.md STEM/Speech/NLP/Jargon.md STEM/Speech/NLP/NLP.md STEM/Speech/NLP/README.md STEM/Speech/NLP/Recognition.md STEM/Speech/Perception/Perception.md STEM/Speech/Perception/README.md STEM/Speech/Speech Processing/Applications.md STEM/Speech/Speech Processing/README.md STEM/Speech/Speech Processing/Source-Filter.md STEM/Speech/Speech Processing/Vocal Tract.md STEM/img/english-phoneme-table.png STEM/img/formant.png STEM/img/pole-zero-attenuation.png STEM/img/pole-zero-feedback.png STEM/img/pole-zero-stable.png STEM/img/roc-right-left.png STEM/img/roc-two-sided.png STEM/img/spectrum-vocal-tract.png STEM/img/transfer-stable-unstable.png STEM/img/vowel-chart.png STEM/img/vowel-spaces.png
1.3 KiB
1.3 KiB
- Automatic Speech Recognition
- Spoken words to machine-readable form
- Natural language understanding
- High level cognitive interpretation
- Structure
- Meaning
- Intention
- High level cognitive interpretation
Automatic Speech Recognition
Applications
- Business/desktop apps
- Dictation
- Voice commands
- Voice enabled services/apps
- Siri
- Home automation
- Game & Entertainment
- Education
- Speech therapy/Rehab
- Hearing assistance
- Live CC
Challenges
- Speaker dependency
- Accent
- Emotion
- Vocab size
- Slang
- Isolated words vs Continuous speech
- Hard to segment continuous speech
- Language constraints & Knowledge sources
- Training source is critical
- Acoustic ambiguity
- Similar sounding speech
- Noise robustness
- Background noise
- Reverberation
Speech Diarisation
- Who speaks when?
- Split stream into homogenous segments for identity
- Structure stream into speaker turns
- Provide speaker identity
- Combination of
- Speaker segmentation
- Speaker changes in stream
- Speaker clustering
- Grouping segments together on basis of characteristics
- Speaker segmentation
- Gaussian mixture model
- HMM
- Bottom-up
- More popular
- Succession of clusters
- Merge redundant clusters
- Remaining belong to speakers
- Top-down
- Single cluster
- Iteratively split until speaker clusters