Andy Pack
f29c435494
Affected files: .obsidian/graph.json .obsidian/workspace.json Gaming/Steam controllers.md Gaming/Ubisoft.md STEM/Signal Proc/Convolution.md STEM/Signal Proc/Fourier Transform.md STEM/Signal Proc/Pole-Zero.md STEM/Signal Proc/System Classes.md STEM/Signal Proc/Transfer Function.md STEM/Speech/Linguistics/Consonants.md STEM/Speech/Linguistics/Linguistics.md STEM/Speech/Linguistics/Terms.md STEM/Speech/Linguistics/Vowels.md STEM/Speech/Literature.md STEM/Speech/NLP/Jargon.md STEM/Speech/NLP/NLP.md STEM/Speech/NLP/Recognition.md STEM/Speech/Perception/Perception.md STEM/Speech/Speech Processing/Applications.md STEM/Speech/Speech Processing/Source-Filter.md STEM/Speech/Speech Processing/Vocal Tract.md Work/Applications/Anthropic/Cover letter.md Work/Applications/Anthropic/In line with values.md Work/Applications/Anthropic/Why Work.md Work/Companies.md Work/Freelancing.md Work/Products.md Work/Tech.md
63 lines
1.3 KiB
Markdown
63 lines
1.3 KiB
Markdown
---
|
|
tags:
|
|
- ai
|
|
- speech
|
|
---
|
|
1. Automatic Speech Recognition
|
|
- Spoken words to machine-readable form
|
|
2. Natural language understanding
|
|
- High level cognitive interpretation
|
|
- Structure
|
|
- Meaning
|
|
- Intention
|
|
|
|
# Automatic Speech Recognition
|
|
## Applications
|
|
- Business/desktop apps
|
|
- Dictation
|
|
- Voice commands
|
|
- Voice enabled services/apps
|
|
- Siri
|
|
- Home automation
|
|
- Game & Entertainment
|
|
- Education
|
|
- Speech therapy/Rehab
|
|
- Hearing assistance
|
|
- Live CC
|
|
|
|
## Challenges
|
|
- Speaker dependency
|
|
- Accent
|
|
- Emotion
|
|
- Vocab size
|
|
- Slang
|
|
- Isolated words vs Continuous speech
|
|
- Hard to segment continuous speech
|
|
- Language constraints & Knowledge sources
|
|
- Training source is critical
|
|
- Acoustic ambiguity
|
|
- Similar sounding speech
|
|
- Noise robustness
|
|
- Background noise
|
|
- Reverberation
|
|
|
|
# Speech Diarisation
|
|
- Who speaks when?
|
|
- Split stream into homogenous segments for identity
|
|
- Structure stream into speaker turns
|
|
- Provide speaker identity
|
|
- Combination of
|
|
- Speaker segmentation
|
|
- Speaker changes in stream
|
|
- Speaker clustering
|
|
- Grouping segments together on basis of characteristics
|
|
- Gaussian mixture model
|
|
- HMM
|
|
- Bottom-up
|
|
- More popular
|
|
- Succession of clusters
|
|
- Merge redundant clusters
|
|
- Remaining belong to speakers
|
|
- Top-down
|
|
- Single cluster
|
|
- Iteratively split until speaker clusters |