vault backup: 2023-06-06 17:01:49

Affected files:
STEM/AI/Kalman Filter.md
STEM/Signal Proc/Convolution.md
STEM/Signal Proc/Image/Tracking.md
STEM/Signal Proc/Pole-Zero.md
STEM/Signal Proc/Transfer Function.md
STEM/Speech/Linguistics/Consonants.md
STEM/Speech/Linguistics/Linguistics.md
STEM/Speech/Linguistics/README.md
STEM/Speech/Linguistics/Terms.md
STEM/Speech/Linguistics/Vowels.md
STEM/Speech/NLP/Jargon.md
STEM/Speech/NLP/NLP.md
STEM/Speech/NLP/README.md
STEM/Speech/NLP/Recognition.md
STEM/Speech/Perception/Perception.md
STEM/Speech/Perception/README.md
STEM/Speech/Speech Processing/Applications.md
STEM/Speech/Speech Processing/README.md
STEM/Speech/Speech Processing/Source-Filter.md
STEM/Speech/Speech Processing/Vocal Tract.md
STEM/img/english-phoneme-table.png
STEM/img/formant.png
STEM/img/pole-zero-attenuation.png
STEM/img/pole-zero-feedback.png
STEM/img/pole-zero-stable.png
STEM/img/roc-right-left.png
STEM/img/roc-two-sided.png
STEM/img/spectrum-vocal-tract.png
STEM/img/transfer-stable-unstable.png
STEM/img/vowel-chart.png
STEM/img/vowel-spaces.png
This commit is contained in:
andy 2023-06-06 17:01:49 +01:00
parent 7bc4dffd8b
commit 5a94c5ff1a
31 changed files with 523 additions and 2 deletions

9
AI/Kalman Filter.md Normal file
View File

@ -0,0 +1,9 @@
- Measure
- Predict
- Update
- Positions and confidences modelled as gaussians
- Mean
- Position
- Variance
- Confidence

View File

@ -14,13 +14,22 @@ $$x(t)=x_1(t)\circledast x_2(t)=\int_{-\infty}^\infty x_1(t-\tau)\cdot x_2(\tau)
4. $Ax_1(t)\circledast Bx_2(t)=AB[x_1(t)\circledast x_2(t)]$ 4. $Ax_1(t)\circledast Bx_2(t)=AB[x_1(t)\circledast x_2(t)]$
- Associativity with Scalar - Associativity with Scalar
5. Symmetrical graph about origin 5. Symmetrical graph about origin
6. $y(t)=x_1(t-a)\circledast x_2(t-b)$
- $x(t)=x_1(t)\circledast x_2(t)$
- $y(t)=x(t-a-b)$
7. $x(t)=x_1(t)\circledast x_2(t)$
- $x_1$ between $a_1$ and $b_1$
- $x_2$ between $a_2$ and $b_2$
- Starting point of $x(t)=a_1+a_2$
- Ending point of $x(t)=b_1+b_2$
8. $\overline{x \circledast y}=\bar x \circledast \bar y$
9. $(x \circledast y)'=x'\circledast y=x\circledast y'$
# Applications # Applications
1. Communications systems 1. Communications systems
- Shift signal in frequency domain (Frequency modulation) - Shift signal in frequency domain (Frequency modulation)
2. System analysis 2. System analysis
- Find system output given input and transfer function - Find system output given input and [transfer function](Transfer%20Function.md)
# Polynomial Multiplication # Polynomial Multiplication
- Convolving coefficients of two poly gives coefficients of product - Convolving coefficients of two poly gives coefficients of product

View File

@ -0,0 +1,42 @@
# Challenges
- Clutter
- Distractors
- Occlusion
- Hidden by other objects
- Objects appearance can evolve
- Rotation, scale, camera viewpoint
- Take new template every n frames
- Take new template when confidence falls below threshold
# Background Tracking
- Static camera
- Capture clean shots of background
- Object present
- Average enough footage
- Background image - current video frame = difference image
- Threshold for binary mask
# Nearest Neighbour Tracking
- Decide component with closest centroid using previous centroid
- Not good for occlusion
- Will snap to next candidate
# Blob Tracking
- Build colour model of object
- Eigenmodel
- Mask of pixels that match object
- Use centroid as location over time
- Pick connected component with centroid closest to previous location
- Good for distinctive colours
- Not for practical situations though
# Template Tracking
- Sample distinctive patch from image
- Search all positions in video for patch
- Use cross-correlation
- Illumination changes
- Brightness is uniform shift of greyscale values up or down
- Correlated to the mean pixel value
- Subtract means in template and frame to give invariance
- Normalised cross-correlation

62
Signal Proc/Pole-Zero.md Normal file
View File

@ -0,0 +1,62 @@
- Poles
- **X**
- Let $X(z) = inf$
- Let $1/X(z) = 0$
- Roots of denominator
- Zeros
- **O**
- Let $X(z) = 0$
- Roots of numerator
- In complex (Z for speech) domain
[Magnitude Response From Pole/Zeros](https://www.youtube.com/watch?v=8jNjVkoZQCU)
[MIT Pole Zero](https://web.mit.edu/2.14/www/Handouts/PoleZero.pdf)
Representation of rational transfer function, identifies
- Stability
- Causal/Anti-causal system
- ROC
- Minimum phase/Non minimum phase
![](../img/pole-zero-attenuation.png)
![](../img/pole-zero-stable.png)
![](../img/pole-zero-feedback.png)
# BIBO Stable
- All poles of H must lie within the unit circle of the plot
- If we give an input less than a constant
- Will get an output less than some constant
# Region of Convergence
- Depends on whether causal or anti-causal
- Cannot contain poles
- Goes to infinity
## Continuous
1. If includes imaginary axis
- BIBO stable
- All poles must be left of i axis
2. Rightwards from pole with largest real-part (not infinity)
- Causal
3. Leftward from pole with smallest real-part (not -infinity)
- Anti-causal
## Discrete
1. If includes unit circle
- BIBO stable
2. Outward from pole with largest (not infinite) magnitude
- Right-sided impulse response
- Causal (if no pole at infinity)
3. Inward from pole with smallest (nonzero) magnitude
- Anti-causal
![](../img/roc-right-left.png)
![](../img/roc-two-sided.png)
Sinusoidal when complex pair
- $e^{-j\omega}$
- Euler's for oscillating
Exponential when on the axis
- Decays, no $i$ in the exponent
![](../img/transfer-stable-unstable.png)

View File

@ -0,0 +1,25 @@
$$Y(s)=H(s)\cdot X(s)$$
- $H(s)=\frac{Y(s)}{X(s)}=\frac{\mathcal L\{y(t)\}}{\mathcal L\{x(t)\}}$
$$Y(z)=H(z)\cdot X(z)$$
- $H(z)=\frac{Y(z)}{X(z)}=\frac{\mathcal Z\{y[n]\}}{\mathcal Z\{x[n]\}}$
$$G(\omega)=\frac{|Y|}{|X|}=|H(j\omega)|$$
- $H(j\omega)$, Frequency response
$$\phi(\omega)=arg(Y)-arg(X)=arg\left(H\left(j\omega\right)\right)$$
- $\phi(\omega)$, Phase shift
$$\tau_\phi(\omega)=-\frac{\phi(\omega)}{\omega}$$
- $\tau_\phi$, Phase delay
- Frequency-dependent amount of delay introduced to the sinusoid by $H$
$$\tau_g(\omega)=-\frac{d\phi(\omega)}{d\omega}$$
- $\tau_g$, Group delay
- Frequency-dependent amount of delay introduced to the envelope of the sinusoid by $H$
[Partial Fractions](https://lpsa.swarthmore.edu/BackGround/PartialFraction/PartialFraction.html#Order_of_numerator_polynomial_is_not_less_than_that_of_the_denominator)
[Partial Fractions for Laplace](https://lpsa.swarthmore.edu/LaplaceXform/InvLaplace/InvLaplaceXformPFE.html)
[Inverse Z Transform](https://lpsa.swarthmore.edu/ZXform/InvZXform/InvZXform.html)
[Discrete Time Systems:Impulse responses and convolution; An introduction to the Z-transform](https://homes.esat.kuleuven.be/~maapc/static/files/SYSTHEORY/Slides/Lecture5/Lecture5-Impulse%20responses%20and%20convolution%20layout.pdf)

View File

@ -0,0 +1,61 @@
- Complete or partial closure of vocal tract
- Voiced/Unvoiced
# Nasal
- Mouth closed and velum lowered
- Vowel-like structure, weaker energy
- Sound through nose
- English has only voiced nasals
- Can be identified by direction of formant movement
- m, n, ng
- Locations
- Labial
- Lips
- m
- Alveolar
- Palatal ridge
- n
- Velar
- Soft palate
- ng
# Plosive
- Closure which is released
- Voiced or unvoiced
- Based on when voicing starts
- t, d
- Trill
- Tap or flap
- Locations
- Labial
- Lips
- p, b
- Alveolar
- Palatal ridge
- t, d
- Velar
- Soft palate
- k, g
# Fricative
- Air forced through constriction
- Jet of turbulence
- Noisy
- Voiced
- v, z
- Unvoiced
- f, s
- Frequency cut-off
- Inversely proportional to length of cavity in front of constriction
- Sibilants
- Air forced over teeth
- s, z
- Affricate
- Abrupt start to frication
- Ch, J
# Approximant
- Articulators move close, some perturbations
- Vowel-like structure, weaker energy
- Upward trajectory of formants
- r, y, l , w

View File

@ -0,0 +1,32 @@
- Phonetics
- Sound of language
- Acoustic result of speech articulation
- Phonology
- How languages or dialects organise sounds
- Within and across languages
- Function of sound units in language
- How phonemes are used
- Morphology
- Word structure
- Syntax
- Sentence structure
- Semantics
- Meaning of words or sentences
- Pragmatics
- How context contributes to meaning
- Speech act theory
- Discourse analysis
- How sentences form text
# Phonetics vs Phonology
- [Phoneme](Terms.md#Phoneme)
- Unit of sound structure
- With linguistic content
- Abstraction of set of sounds
- Allophones
- Phonology
- Phone
- Segment of speech recording
- Single sound
- Single phoneme
- Phonetics

View File

@ -0,0 +1 @@
Linguistics.md

View File

@ -0,0 +1,31 @@
# Phoneme
- Smallest unit of speech
- Distinguish words
- Continuous speech is stream of different phonemes
- English has 44
- Consonants
- Voiced or unvoiced
- Vowels
- All voiced
# Voiced
- Quasi-periodic vibration of vocal cords
- Sound transmitted along vocal tract unimpeded
- Air forced through glottis
- Quasi-periodic oscillation
# Unvoiced
- Air flow through vocal apparatus is either cut-off or impeded
- Constriction using tongue or lips
- Turbulence or noise
- Fricatives
- \\s\\, \\f\
- Plosives
- \\p\\, \\t\
- Modelled as random noise
# Formant
- Vocal tract can be modelled as resonant system
- Modes or peaks in spectral response of resonant system
- Lowest 3 most important in speech
![](../../img/formant.png)

File diff suppressed because one or more lines are too long

26
Speech/NLP/Jargon.md Normal file
View File

@ -0,0 +1,26 @@
- Types
- Distinct words
- |V|
- Tokens
- All words
- N
- Related quantities
- Have equations that estimate
- Disfluencies
- Fragments
- Broken/half spoken words
- Fillers
- Oo
- Uh
- May strip
- May be helpful
- Speaker may start clause again
- Clitic
- Part of a word that cant stand on its own
- Whatre
- Contractions
# Edit Distance
- Distance between words
- Number of insertions deletions
- Similarity

29
Speech/NLP/NLP.md Normal file
View File

@ -0,0 +1,29 @@
# Text Normalisation
- Tokenisation
- Labelling parts of sentence
- Usually words
- Can be multiple
- Proper nouns
- New York
- Emoticons
- Hashtags
- May need some named entity recognition
- Penn Treebank standard
- Byte-pair encoding
- Standard cant understand unseen words
- Encode as subwords
- -est, -er
- Lemmatisation
- Determining roots of words
- Verb infinitives
- Find lemma
- Derived forms are inflections or inflected
- Word-forms
- Critical for morphological complex languages
- Arabic
- Stemming
- Simpler than lemmatisation
- Just removing suffixes
- Normalising word formats
- Segmenting sentences

1
Speech/NLP/README.md Symbolic link
View File

@ -0,0 +1 @@
NLP.md

58
Speech/NLP/Recognition.md Normal file
View File

@ -0,0 +1,58 @@
1. Automatic Speech Recognition
- Spoken words to machine-readable form
2. Natural language understanding
- High level cognitive interpretation
- Structure
- Meaning
- Intention
# Automatic Speech Recognition
## Applications
- Business/desktop apps
- Dictation
- Voice commands
- Voice enabled services/apps
- Siri
- Home automation
- Game & Entertainment
- Education
- Speech therapy/Rehab
- Hearing assistance
- Live CC
## Challenges
- Speaker dependency
- Accent
- Emotion
- Vocab size
- Slang
- Isolated words vs Continuous speech
- Hard to segment continuous speech
- Language constraints & Knowledge sources
- Training source is critical
- Acoustic ambiguity
- Similar sounding speech
- Noise robustness
- Background noise
- Reverberation
# Speech Diarisation
- Who speaks when?
- Split stream into homogenous segments for identity
- Structure stream into speaker turns
- Provide speaker identity
- Combination of
- Speaker segmentation
- Speaker changes in stream
- Speaker clustering
- Grouping segments together on basis of characteristics
- Gaussian mixture model
- HMM
- Bottom-up
- More popular
- Succession of clusters
- Merge redundant clusters
- Remaining belong to speakers
- Top-down
- Single cluster
- Iteratively split until speaker clusters

View File

@ -0,0 +1,8 @@
# Physiological
- Physical/mechanical processing of sound
- Ear stuff
# Psychological
- Brain function and processing
***Psychoacoustics incorporates both***

1
Speech/Perception/README.md Symbolic link
View File

@ -0,0 +1 @@
Perception.md

View File

@ -0,0 +1,25 @@
- Speech telecommunications & Encoding
- Preserving perceptibility and quality over the wire
- Minimising bandwidth
- Speech enhancement
- Restoration of degraded speech
- Additive noise
- Reverberation
- Echoes
- Background sounds
- Blind source separation
- Adaptive filtering
- Spectral subtraction
- Speech & Speaker recognition
- Auto conversion of speech to written
- Identifying speaker based on speech
- Dictation, speaker recognition for security
- Speaker diarisation
- "Who speaks when"
- Speaker segmentation
- Speaker change point in stream
- Speaker clustering
- Grouping segments based on speaker identity
- Speech synthesis
- Speech analysis
- Waveform & spectrum

View File

@ -0,0 +1 @@
Applications.md

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,16 @@
- Input and output signals are real
- Filter coefficients are real for rational $H(z)$
- Poles/zeros either real or complex conjugate pairs
- BIBO stability important
# Frequency Response
- Sample along unit circle
- $|z|=\left|e^{i\omega}\right|=1$
# Magnitude Response
$$\left|H(e^{i\omega})\right|=\frac{b_0|e^{i\omega}-\beta_1|\cdot\cdot\cdot|e^{i\omega}-\beta_q|}{|e^{i\omega}-\alpha_1|\cdot\cdot\cdot|e^{i\omega}-\alpha_p|}$$
![](../../img/spectrum-vocal-tract.png)
- LPC & Cepstral Analysis to separate
- Residual allows more accurate estimation of pitch period

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

BIN
img/formant.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

BIN
img/pole-zero-feedback.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

BIN
img/pole-zero-stable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

BIN
img/roc-right-left.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

BIN
img/roc-two-sided.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 320 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
img/vowel-chart.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
img/vowel-spaces.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB