vault backup: 2023-06-06 17:01:49

Affected files: STEM/AI/Kalman Filter.md STEM/Signal Proc/Convolution.md STEM/Signal Proc/Image/Tracking.md STEM/Signal Proc/Pole-Zero.md STEM/Signal Proc/Transfer Function.md STEM/Speech/Linguistics/Consonants.md STEM/Speech/Linguistics/Linguistics.md STEM/Speech/Linguistics/README.md STEM/Speech/Linguistics/Terms.md STEM/Speech/Linguistics/Vowels.md STEM/Speech/NLP/Jargon.md STEM/Speech/NLP/NLP.md STEM/Speech/NLP/README.md STEM/Speech/NLP/Recognition.md STEM/Speech/Perception/Perception.md STEM/Speech/Perception/README.md STEM/Speech/Speech Processing/Applications.md STEM/Speech/Speech Processing/README.md STEM/Speech/Speech Processing/Source-Filter.md STEM/Speech/Speech Processing/Vocal Tract.md STEM/img/english-phoneme-table.png STEM/img/formant.png STEM/img/pole-zero-attenuation.png STEM/img/pole-zero-feedback.png STEM/img/pole-zero-stable.png STEM/img/roc-right-left.png STEM/img/roc-two-sided.png STEM/img/spectrum-vocal-tract.png STEM/img/transfer-stable-unstable.png STEM/img/vowel-chart.png STEM/img/vowel-spaces.png
2023-06-06 17:01:49 +01:00 · 2023-06-06 17:01:49 +01:00 · 5a94c5ff1a
commit 5a94c5ff1a
parent 7bc4dffd8b
31 changed files with 523 additions and 2 deletions
--- a/Filter.md
+++ b/Filter.md
@ -0,0 +1,9 @@
 - Measure
 - Predict
 - Update
 - Positions and confidences modelled as gaussians
 - Mean
 	- Position
 - Variance
 	- Confidence
--- a/Proc/Convolution.md
+++ b/Proc/Convolution.md
@ -14,13 +14,22 @@ $$x(t)=x_1(t)\circledast x_2(t)=\int_{-\infty}^\infty x_1(t-\tau)\cdot x_2(\tau)
 4. $Ax_1(t)\circledast Bx_2(t)=AB[x_1(t)\circledast x_2(t)]$
 	- Associativity with Scalar
 5. Symmetrical graph about origin
 6. $y(t)=x_1(t-a)\circledast x_2(t-b)$
 	- $x(t)=x_1(t)\circledast x_2(t)$
 	- $y(t)=x(t-a-b)$
 7. $x(t)=x_1(t)\circledast x_2(t)$
 	- $x_1$ between $a_1$ and $b_1$
 	- $x_2$ between $a_2$ and $b_2$
 	- Starting point of $x(t)=a_1+a_2$
 	- Ending point of $x(t)=b_1+b_2$
 8. $\overline{x \circledast y}=\bar x \circledast \bar y$
 9. $(x \circledast y)'=x'\circledast y=x\circledast y'$
 # Applications
 1. Communications systems
 	- Shift signal in frequency domain (Frequency modulation)
 2. System analysis
-	- Find system output given input and transfer function
+	- Find system output given input and [transfer function](Transfer%20Function.md)
 # Polynomial Multiplication
 -   Convolving coefficients of two poly gives coefficients of product
--- a/Proc/Image/Tracking.md
+++ b/Proc/Image/Tracking.md
@ -0,0 +1,42 @@
 # Challenges
 - Clutter
 	- Distractors
 - Occlusion
 	- Hidden by other objects
 - Objects appearance can evolve
 	- Rotation, scale, camera viewpoint
 - Take new template every n frames
 - Take new template when confidence falls below threshold
 # Background Tracking
 - Static camera
 	- Capture clean shots of background
 - Object present
 	- Average enough footage
 - Background image - current video frame = difference image
 	- Threshold for binary mask
 # Nearest Neighbour Tracking
 - Decide component with closest centroid using previous centroid
 - Not good for occlusion
 	- Will snap to next candidate
 # Blob Tracking
 - Build colour model of object
 	- Eigenmodel
 - Mask of pixels that match object
 - Use centroid as location over time
 - Pick connected component with centroid closest to previous location
 - Good for distinctive colours
 	- Not for practical situations though
 # Template Tracking
 - Sample distinctive patch from image
 - Search all positions in video for patch
 - Use cross-correlation
 - Illumination changes
 	- Brightness is uniform shift of greyscale values up or down
 	- Correlated to the mean pixel value
 	- Subtract means in template and frame to give invariance
 	- Normalised cross-correlation
--- a/Proc/Pole-Zero.md
+++ b/Proc/Pole-Zero.md
@ -0,0 +1,62 @@
 - Poles
 	- **X**
 	- Let $X(z) = inf$
 		- Let $1/X(z) = 0$
 	- Roots of denominator
 - Zeros
 	- **O**
 	- Let $X(z) = 0$
 	- Roots of numerator
 - In complex (Z for speech) domain
 [Magnitude Response From Pole/Zeros](https://www.youtube.com/watch?v=8jNjVkoZQCU)
 [MIT Pole Zero](https://web.mit.edu/2.14/www/Handouts/PoleZero.pdf)
 Representation of rational transfer function, identifies
 - Stability
 - Causal/Anti-causal system
 - ROC
 - Minimum phase/Non minimum phase
 ![](../img/pole-zero-attenuation.png)
 ![](../img/pole-zero-stable.png)
 ![](../img/pole-zero-feedback.png)
 # BIBO Stable
 - All poles of H must lie within the unit circle of the plot
 - If we give an input less than a constant
 - Will get an output less than some constant
 # Region of Convergence
 - Depends on whether causal or anti-causal
 - Cannot contain poles
 	- Goes to infinity
 ## Continuous
 1. If includes imaginary axis
 	- BIBO stable
 	- All poles must be left of i axis
 2. Rightwards from pole with largest real-part (not infinity)
 	- Causal
 3. Leftward from pole with smallest real-part (not -infinity)
 	- Anti-causal
 ## Discrete
 1. If includes unit circle
 	- BIBO stable
 2. Outward from pole with largest (not infinite) magnitude
 	- Right-sided impulse response
 	- Causal (if no pole at infinity)
 3. Inward from pole with smallest (nonzero) magnitude
 	- Anti-causal
 ![](../img/roc-right-left.png)
 ![](../img/roc-two-sided.png)
 Sinusoidal when complex pair
 - $e^{-j\omega}$
 - Euler's for oscillating
 Exponential when on the axis
 - Decays, no $i$ in the exponent
 ![](../img/transfer-stable-unstable.png)
--- a/Proc/Transfer
+++ b/Proc/Transfer
@ -0,0 +1,25 @@
 $$Y(s)=H(s)\cdot X(s)$$
 - $H(s)=\frac{Y(s)}{X(s)}=\frac{\mathcal L\{y(t)\}}{\mathcal L\{x(t)\}}$
 $$Y(z)=H(z)\cdot X(z)$$
 - $H(z)=\frac{Y(z)}{X(z)}=\frac{\mathcal Z\{y[n]\}}{\mathcal Z\{x[n]\}}$
 $$G(\omega)=\frac{|Y|}{|X|}=|H(j\omega)|$$
 - $H(j\omega)$, Frequency response
 $$\phi(\omega)=arg(Y)-arg(X)=arg\left(H\left(j\omega\right)\right)$$
 - $\phi(\omega)$, Phase shift
 $$\tau_\phi(\omega)=-\frac{\phi(\omega)}{\omega}$$
 - $\tau_\phi$, Phase delay
 - Frequency-dependent amount of delay introduced to the sinusoid by $H$
 $$\tau_g(\omega)=-\frac{d\phi(\omega)}{d\omega}$$
 - $\tau_g$, Group delay
 - Frequency-dependent amount of delay introduced to the envelope of the sinusoid by $H$
 [Partial Fractions](https://lpsa.swarthmore.edu/BackGround/PartialFraction/PartialFraction.html#Order_of_numerator_polynomial_is_not_less_than_that_of_the_denominator)
 [Partial Fractions for Laplace](https://lpsa.swarthmore.edu/LaplaceXform/InvLaplace/InvLaplaceXformPFE.html)
 [Inverse Z Transform](https://lpsa.swarthmore.edu/ZXform/InvZXform/InvZXform.html)
 [Discrete Time Systems:Impulse responses and convolution; An introduction to the Z-transform](https://homes.esat.kuleuven.be/~maapc/static/files/SYSTHEORY/Slides/Lecture5/Lecture5-Impulse%20responses%20and%20convolution%20layout.pdf)
--- a/Speech/Linguistics/Consonants.md
+++ b/Speech/Linguistics/Consonants.md
@ -0,0 +1,61 @@
 - Complete or partial closure of vocal tract
 - Voiced/Unvoiced
 # Nasal
 - Mouth closed and velum lowered
 - Vowel-like structure, weaker energy
 - Sound through nose
 - English has only voiced nasals
 - Can be identified by direction of formant movement
 - m, n, ng
 - Locations
 	- Labial
 		- Lips
 		- m
 	- Alveolar
 		- Palatal ridge
 		- n
 	- Velar
 		- Soft palate
 		- ng
 # Plosive
 - Closure which is released
 - Voiced or unvoiced
 	- Based on when voicing starts
 	- t, d
 - Trill
 - Tap or flap
 - Locations
 	- Labial
 		- Lips
 		- p, b
 	- Alveolar
 		- Palatal ridge
 		- t, d
 	- Velar
 		- Soft palate
 		- k, g
 # Fricative
 - Air forced through constriction
 - Jet of turbulence
 - Noisy
 - Voiced
 	- v, z
 - Unvoiced
 	- f, s
 - Frequency cut-off
 	- Inversely proportional to length of cavity in front of constriction
 - Sibilants
 	- Air forced over teeth
 	- s, z
 - Affricate
 	- Abrupt start to frication
 	- Ch, J
 # Approximant
 - Articulators move close, some perturbations
 - Vowel-like structure, weaker energy
 - Upward trajectory of formants
 - r, y, l , w
--- a/Speech/Linguistics/Linguistics.md
+++ b/Speech/Linguistics/Linguistics.md
@ -0,0 +1,32 @@
 - Phonetics
 	- Sound of language
 	- Acoustic result of speech articulation
 - Phonology
 	- How languages or dialects organise sounds
 	- Within and across languages
 	- Function of sound units in language
 		- How phonemes are used
 - Morphology
 	- Word structure
 - Syntax
 	- Sentence structure
 - Semantics
 	- Meaning of words or sentences
 - Pragmatics
 	- How context contributes to meaning
 	- Speech act theory
 - Discourse analysis
 	- How sentences form text
 # Phonetics vs Phonology
 - [Phoneme](Terms.md#Phoneme)
 	- Unit of sound structure
 		- With linguistic content
 	- Abstraction of set of sounds
 		- Allophones
 	- Phonology
 - Phone
 	- Segment of speech recording
 	- Single sound
 		- Single phoneme
 	- Phonetics
--- a/Speech/Linguistics/README.md
+++ b/Speech/Linguistics/README.md
@ -0,0 +1 @@
 Linguistics.md
--- a/Speech/Linguistics/Terms.md
+++ b/Speech/Linguistics/Terms.md
@ -0,0 +1,31 @@
 # Phoneme
 - Smallest unit of speech
 	- Distinguish words
 - Continuous speech is stream of different phonemes
 - English has 44
 - Consonants
 	- Voiced or unvoiced
 - Vowels
 	- All voiced
 # Voiced
 - Quasi-periodic vibration of vocal cords
 - Sound transmitted along vocal tract unimpeded
 - Air forced through glottis
 	- Quasi-periodic oscillation
 # Unvoiced
 - Air flow through vocal apparatus is either cut-off or impeded
 	- Constriction using tongue or lips
 - Turbulence or noise
 	- Fricatives
 		- \\s\\, \\f\
 	- Plosives
 		- \\p\\, \\t\
 - Modelled as random noise
 # Formant
 - Vocal tract can be modelled as resonant system
 - Modes or peaks in spectral response of resonant system
 - Lowest 3 most important in speech
 ![](../../img/formant.png)
--- a/Speech/Linguistics/Vowels.md
+++ b/Speech/Linguistics/Vowels.md
--- a/Speech/NLP/Jargon.md
+++ b/Speech/NLP/Jargon.md
@ -0,0 +1,26 @@
 - Types
 	- Distinct words
 	- |V|
 - Tokens
 	- All words
 	- N
 	- Related quantities
 		- Have equations that estimate
 - Disfluencies
 	- Fragments
 		- Broken/half spoken words
 	- Fillers
 		- Oo
 		- Uh
 		- May strip
 		- May be helpful
 			- Speaker may start clause again
 - Clitic
 	- Part of a word that can’t stand on its own
 	- What’re
 	- Contractions
 # Edit Distance
 - Distance between words
 - Number of insertions deletions
 - Similarity
--- a/Speech/NLP/NLP.md
+++ b/Speech/NLP/NLP.md
@ -0,0 +1,29 @@
 # Text Normalisation
 - Tokenisation
 	- Labelling parts of sentence
 	- Usually words
 	- Can be multiple
 		- Proper nouns
 		- New York
 		- Emoticons
 		- Hashtags
 	- May need some named entity recognition
 	- Penn Treebank standard
 	- Byte-pair encoding
 		- Standard can’t understand unseen words
 		- Encode as subwords
 			- -est, -er
 - Lemmatisation
 	- Determining roots of words
 	- Verb infinitives
 	- Find lemma
 		- Derived forms are inflections or inflected
 			- Word-forms
 	- Critical for morphological complex languages
 		- Arabic
 - Stemming
 	- Simpler than lemmatisation
 	- Just removing suffixes
 - Normalising word formats
 - Segmenting sentences
--- a/Speech/NLP/README.md
+++ b/Speech/NLP/README.md
@ -0,0 +1 @@
 NLP.md
--- a/Speech/NLP/Recognition.md
+++ b/Speech/NLP/Recognition.md
@ -0,0 +1,58 @@
 1. Automatic Speech Recognition
 	- Spoken words to machine-readable form
 2. Natural language understanding
 	- High level cognitive interpretation
 		- Structure
 		- Meaning
 		- Intention
 # Automatic Speech Recognition
 ## Applications
 - Business/desktop apps
 	- Dictation
 	- Voice commands
 - Voice enabled services/apps
 	- Siri
 - Home automation
 - Game & Entertainment
 - Education
 - Speech therapy/Rehab
 - Hearing assistance
 	- Live CC
 ## Challenges
 - Speaker dependency
 	- Accent
 	- Emotion
 - Vocab size
 	- Slang
 - Isolated words vs Continuous speech
 	- Hard to segment continuous speech
 - Language constraints & Knowledge sources
 	- Training source is critical
 - Acoustic ambiguity
 	- Similar sounding speech
 - Noise robustness
 	- Background noise
 	- Reverberation
 # Speech Diarisation
 - Who speaks when?
 - Split stream into homogenous segments for identity
 - Structure stream into speaker turns
 - Provide speaker identity
 - Combination of
 	- Speaker segmentation
 		- Speaker changes in stream
 	- Speaker clustering
 		- Grouping segments together on basis of characteristics
 - Gaussian mixture model
 	- HMM
 - Bottom-up
 	- More popular
 	- Succession of clusters
 	- Merge redundant clusters
 		- Remaining belong to speakers
 - Top-down
 	- Single cluster
 	- Iteratively split until speaker clusters
--- a/Speech/Perception/Perception.md
+++ b/Speech/Perception/Perception.md
@ -0,0 +1,8 @@
 # Physiological
 - Physical/mechanical processing of sound
 - Ear stuff
 # Psychological
 - Brain function and processing
 ***Psychoacoustics incorporates both***
--- a/Speech/Perception/README.md
+++ b/Speech/Perception/README.md
@ -0,0 +1 @@
 Perception.md
--- a/Processing/Applications.md
+++ b/Processing/Applications.md
@ -0,0 +1,25 @@
 - Speech telecommunications & Encoding
 	- Preserving perceptibility and quality over the wire
 	- Minimising bandwidth
 - Speech enhancement
 	- Restoration of degraded speech
 		- Additive noise
 		- Reverberation
 		- Echoes
 		- Background sounds
 	- Blind source separation
 	- Adaptive filtering
 	- Spectral subtraction
 - Speech & Speaker recognition
 	- Auto conversion of speech to written
 	- Identifying speaker based on speech
 	- Dictation, speaker recognition for security
 - Speaker diarisation
 	- "Who speaks when"
 	- Speaker segmentation
 		- Speaker change point in stream
 	- Speaker clustering
 		- Grouping segments based on speaker identity
 - Speech synthesis
 - Speech analysis
 	- Waveform & spectrum
--- a/Processing/README.md
+++ b/Processing/README.md
@ -0,0 +1 @@
 Applications.md
--- a/Processing/Source-Filter.md
+++ b/Processing/Source-Filter.md
--- a/Processing/Vocal
+++ b/Processing/Vocal
@ -0,0 +1,16 @@
 - Input and output signals are real
 	- Filter coefficients are real for rational $H(z)$
 	- Poles/zeros either real or complex conjugate pairs
 - BIBO stability important
 # Frequency Response
 - Sample along unit circle
 	- $|z|=\left|e^{i\omega}\right|=1$
 # Magnitude Response
 $$\left|H(e^{i\omega})\right|=\frac{b_0|e^{i\omega}-\beta_1|\cdot\cdot\cdot|e^{i\omega}-\beta_q|}{|e^{i\omega}-\alpha_1|\cdot\cdot\cdot|e^{i\omega}-\alpha_p|}$$
 ![](../../img/spectrum-vocal-tract.png)
 - LPC & Cepstral Analysis to separate
 - Residual allows more accurate estimation of pitch period
--- a/img/english-phoneme-table.png
+++ b/img/english-phoneme-table.png
--- a/img/formant.png
+++ b/img/formant.png
--- a/img/pole-zero-attenuation.png
+++ b/img/pole-zero-attenuation.png
--- a/img/pole-zero-feedback.png
+++ b/img/pole-zero-feedback.png
--- a/img/pole-zero-stable.png
+++ b/img/pole-zero-stable.png
--- a/img/roc-right-left.png
+++ b/img/roc-right-left.png
--- a/img/roc-two-sided.png
+++ b/img/roc-two-sided.png
--- a/img/spectrum-vocal-tract.png
+++ b/img/spectrum-vocal-tract.png
--- a/img/transfer-stable-unstable.png
+++ b/img/transfer-stable-unstable.png
--- a/img/vowel-chart.png
+++ b/img/vowel-chart.png
--- a/img/vowel-spaces.png
+++ b/img/vowel-spaces.png