andy 5a94c5ff1a vault backup: 2023-06-06 17:01:49

Affected files:
STEM/AI/Kalman Filter.md
STEM/Signal Proc/Convolution.md
STEM/Signal Proc/Image/Tracking.md
STEM/Signal Proc/Pole-Zero.md
STEM/Signal Proc/Transfer Function.md
STEM/Speech/Linguistics/Consonants.md
STEM/Speech/Linguistics/Linguistics.md
STEM/Speech/Linguistics/README.md
STEM/Speech/Linguistics/Terms.md
STEM/Speech/Linguistics/Vowels.md
STEM/Speech/NLP/Jargon.md
STEM/Speech/NLP/NLP.md
STEM/Speech/NLP/README.md
STEM/Speech/NLP/Recognition.md
STEM/Speech/Perception/Perception.md
STEM/Speech/Perception/README.md
STEM/Speech/Speech Processing/Applications.md
STEM/Speech/Speech Processing/README.md
STEM/Speech/Speech Processing/Source-Filter.md
STEM/Speech/Speech Processing/Vocal Tract.md
STEM/img/english-phoneme-table.png
STEM/img/formant.png
STEM/img/pole-zero-attenuation.png
STEM/img/pole-zero-feedback.png
STEM/img/pole-zero-stable.png
STEM/img/roc-right-left.png
STEM/img/roc-two-sided.png
STEM/img/spectrum-vocal-tract.png
STEM/img/transfer-stable-unstable.png
STEM/img/vowel-chart.png
STEM/img/vowel-spaces.png

2023-06-06 17:01:49 +01:00

1.3 KiB

Raw Blame History

Automatic Speech Recognition
- Spoken words to machine-readable form
Natural language understanding
- High level cognitive interpretation
  - Structure
  - Meaning
  - Intention

Automatic Speech Recognition

Applications

Business/desktop apps
- Dictation
- Voice commands
Voice enabled services/apps
- Siri
Home automation
Game & Entertainment
Education
Speech therapy/Rehab
Hearing assistance
- Live CC

Challenges

Speaker dependency
- Accent
- Emotion
Vocab size
- Slang
Isolated words vs Continuous speech
- Hard to segment continuous speech
Language constraints & Knowledge sources
- Training source is critical
Acoustic ambiguity
- Similar sounding speech
Noise robustness
- Background noise
- Reverberation

Speech Diarisation

Who speaks when?
Split stream into homogenous segments for identity
Structure stream into speaker turns
Provide speaker identity
Combination of
- Speaker segmentation
  - Speaker changes in stream
- Speaker clustering
  - Grouping segments together on basis of characteristics
Gaussian mixture model
- HMM
Bottom-up
- More popular
- Succession of clusters
- Merge redundant clusters
  - Remaining belong to speakers
Top-down
- Single cluster
- Iteratively split until speaker clusters

1.3 KiB Raw Blame History

Automatic Speech Recognition

Applications

Challenges

Speech Diarisation

1.3 KiB

Raw Blame History