stem/Speech/NLP/Recognition.md
andy 5a94c5ff1a vault backup: 2023-06-06 17:01:49
Affected files:
STEM/AI/Kalman Filter.md
STEM/Signal Proc/Convolution.md
STEM/Signal Proc/Image/Tracking.md
STEM/Signal Proc/Pole-Zero.md
STEM/Signal Proc/Transfer Function.md
STEM/Speech/Linguistics/Consonants.md
STEM/Speech/Linguistics/Linguistics.md
STEM/Speech/Linguistics/README.md
STEM/Speech/Linguistics/Terms.md
STEM/Speech/Linguistics/Vowels.md
STEM/Speech/NLP/Jargon.md
STEM/Speech/NLP/NLP.md
STEM/Speech/NLP/README.md
STEM/Speech/NLP/Recognition.md
STEM/Speech/Perception/Perception.md
STEM/Speech/Perception/README.md
STEM/Speech/Speech Processing/Applications.md
STEM/Speech/Speech Processing/README.md
STEM/Speech/Speech Processing/Source-Filter.md
STEM/Speech/Speech Processing/Vocal Tract.md
STEM/img/english-phoneme-table.png
STEM/img/formant.png
STEM/img/pole-zero-attenuation.png
STEM/img/pole-zero-feedback.png
STEM/img/pole-zero-stable.png
STEM/img/roc-right-left.png
STEM/img/roc-two-sided.png
STEM/img/spectrum-vocal-tract.png
STEM/img/transfer-stable-unstable.png
STEM/img/vowel-chart.png
STEM/img/vowel-spaces.png
2023-06-06 17:01:49 +01:00

1.3 KiB

  1. Automatic Speech Recognition
    • Spoken words to machine-readable form
  2. Natural language understanding
    • High level cognitive interpretation
      • Structure
      • Meaning
      • Intention

Automatic Speech Recognition

Applications

  • Business/desktop apps
    • Dictation
    • Voice commands
  • Voice enabled services/apps
    • Siri
  • Home automation
  • Game & Entertainment
  • Education
  • Speech therapy/Rehab
  • Hearing assistance
    • Live CC

Challenges

  • Speaker dependency
    • Accent
    • Emotion
  • Vocab size
    • Slang
  • Isolated words vs Continuous speech
    • Hard to segment continuous speech
  • Language constraints & Knowledge sources
    • Training source is critical
  • Acoustic ambiguity
    • Similar sounding speech
  • Noise robustness
    • Background noise
    • Reverberation

Speech Diarisation

  • Who speaks when?
  • Split stream into homogenous segments for identity
  • Structure stream into speaker turns
  • Provide speaker identity
  • Combination of
    • Speaker segmentation
      • Speaker changes in stream
    • Speaker clustering
      • Grouping segments together on basis of characteristics
  • Gaussian mixture model
    • HMM
  • Bottom-up
    • More popular
    • Succession of clusters
    • Merge redundant clusters
      • Remaining belong to speakers
  • Top-down
    • Single cluster
    • Iteratively split until speaker clusters