stem/Speech/NLP/Recognition.md

1. Automatic Speech Recognition
	- Spoken words to machine-readable form
2. Natural language understanding
	- High level cognitive interpretation
		- Structure
		- Meaning
		- Intention

# Automatic Speech Recognition
## Applications
- Business/desktop apps
	- Dictation
	- Voice commands
- Voice enabled services/apps
	- Siri
- Home automation
- Game & Entertainment
- Education
- Speech therapy/Rehab
- Hearing assistance
	- Live CC

## Challenges
- Speaker dependency
	- Accent
	- Emotion
- Vocab size
	- Slang
- Isolated words vs Continuous speech
	- Hard to segment continuous speech
- Language constraints & Knowledge sources
	- Training source is critical
- Acoustic ambiguity
	- Similar sounding speech
- Noise robustness
	- Background noise
	- Reverberation

# Speech Diarisation
- Who speaks when?
- Split stream into homogenous segments for identity
- Structure stream into speaker turns
- Provide speaker identity
- Combination of
	- Speaker segmentation
		- Speaker changes in stream
	- Speaker clustering
		- Grouping segments together on basis of characteristics
- Gaussian mixture model
	- HMM
- Bottom-up
	- More popular
	- Succession of clusters
	- Merge redundant clusters
		- Remaining belong to speakers
- Top-down
	- Single cluster
	- Iteratively split until speaker clusters
vault backup: 2023-06-06 17:01:49 Affected files: STEM/AI/Kalman Filter.md STEM/Signal Proc/Convolution.md STEM/Signal Proc/Image/Tracking.md STEM/Signal Proc/Pole-Zero.md STEM/Signal Proc/Transfer Function.md STEM/Speech/Linguistics/Consonants.md STEM/Speech/Linguistics/Linguistics.md STEM/Speech/Linguistics/README.md STEM/Speech/Linguistics/Terms.md STEM/Speech/Linguistics/Vowels.md STEM/Speech/NLP/Jargon.md STEM/Speech/NLP/NLP.md STEM/Speech/NLP/README.md STEM/Speech/NLP/Recognition.md STEM/Speech/Perception/Perception.md STEM/Speech/Perception/README.md STEM/Speech/Speech Processing/Applications.md STEM/Speech/Speech Processing/README.md STEM/Speech/Speech Processing/Source-Filter.md STEM/Speech/Speech Processing/Vocal Tract.md STEM/img/english-phoneme-table.png STEM/img/formant.png STEM/img/pole-zero-attenuation.png STEM/img/pole-zero-feedback.png STEM/img/pole-zero-stable.png STEM/img/roc-right-left.png STEM/img/roc-two-sided.png STEM/img/spectrum-vocal-tract.png STEM/img/transfer-stable-unstable.png STEM/img/vowel-chart.png STEM/img/vowel-spaces.png 2023-06-06 17:01:49 +01:00			`1. Automatic Speech Recognition`
			`- Spoken words to machine-readable form`
			`2. Natural language understanding`
			`- High level cognitive interpretation`
			`- Structure`
			`- Meaning`
			`- Intention`

			`# Automatic Speech Recognition`
			`## Applications`
			`- Business/desktop apps`
			`- Dictation`
			`- Voice commands`
			`- Voice enabled services/apps`
			`- Siri`
			`- Home automation`
			`- Game & Entertainment`
			`- Education`
			`- Speech therapy/Rehab`
			`- Hearing assistance`
			`- Live CC`

			`## Challenges`
			`- Speaker dependency`
			`- Accent`
			`- Emotion`
			`- Vocab size`
			`- Slang`
			`- Isolated words vs Continuous speech`
			`- Hard to segment continuous speech`
			`- Language constraints & Knowledge sources`
			`- Training source is critical`
			`- Acoustic ambiguity`
			`- Similar sounding speech`
			`- Noise robustness`
			`- Background noise`
			`- Reverberation`

			`# Speech Diarisation`
			`- Who speaks when?`
			`- Split stream into homogenous segments for identity`
			`- Structure stream into speaker turns`
			`- Provide speaker identity`
			`- Combination of`
			`- Speaker segmentation`
			`- Speaker changes in stream`
			`- Speaker clustering`
			`- Grouping segments together on basis of characteristics`
			`- Gaussian mixture model`
			`- HMM`
			`- Bottom-up`
			`- More popular`
			`- Succession of clusters`
			`- Merge redundant clusters`
			`- Remaining belong to speakers`
			`- Top-down`
			`- Single cluster`
			`- Iteratively split until speaker clusters`