stem/Speech/NLP/Recognition.md
Andy Pack f29c435494 vault backup: 2023-12-27 22:38:55
Affected files:
.obsidian/graph.json
.obsidian/workspace.json
Gaming/Steam controllers.md
Gaming/Ubisoft.md
STEM/Signal Proc/Convolution.md
STEM/Signal Proc/Fourier Transform.md
STEM/Signal Proc/Pole-Zero.md
STEM/Signal Proc/System Classes.md
STEM/Signal Proc/Transfer Function.md
STEM/Speech/Linguistics/Consonants.md
STEM/Speech/Linguistics/Linguistics.md
STEM/Speech/Linguistics/Terms.md
STEM/Speech/Linguistics/Vowels.md
STEM/Speech/Literature.md
STEM/Speech/NLP/Jargon.md
STEM/Speech/NLP/NLP.md
STEM/Speech/NLP/Recognition.md
STEM/Speech/Perception/Perception.md
STEM/Speech/Speech Processing/Applications.md
STEM/Speech/Speech Processing/Source-Filter.md
STEM/Speech/Speech Processing/Vocal Tract.md
Work/Applications/Anthropic/Cover letter.md
Work/Applications/Anthropic/In line with values.md
Work/Applications/Anthropic/Why Work.md
Work/Companies.md
Work/Freelancing.md
Work/Products.md
Work/Tech.md
2023-12-27 22:38:56 +00:00

1.3 KiB

tags
ai
speech
  1. Automatic Speech Recognition
    • Spoken words to machine-readable form
  2. Natural language understanding
    • High level cognitive interpretation
      • Structure
      • Meaning
      • Intention

Automatic Speech Recognition

Applications

  • Business/desktop apps
    • Dictation
    • Voice commands
  • Voice enabled services/apps
    • Siri
  • Home automation
  • Game & Entertainment
  • Education
  • Speech therapy/Rehab
  • Hearing assistance
    • Live CC

Challenges

  • Speaker dependency
    • Accent
    • Emotion
  • Vocab size
    • Slang
  • Isolated words vs Continuous speech
    • Hard to segment continuous speech
  • Language constraints & Knowledge sources
    • Training source is critical
  • Acoustic ambiguity
    • Similar sounding speech
  • Noise robustness
    • Background noise
    • Reverberation

Speech Diarisation

  • Who speaks when?
  • Split stream into homogenous segments for identity
  • Structure stream into speaker turns
  • Provide speaker identity
  • Combination of
    • Speaker segmentation
      • Speaker changes in stream
    • Speaker clustering
      • Grouping segments together on basis of characteristics
  • Gaussian mixture model
    • HMM
  • Bottom-up
    • More popular
    • Succession of clusters
    • Merge redundant clusters
      • Remaining belong to speakers
  • Top-down
    • Single cluster
    • Iteratively split until speaker clusters