vault backup: 2023-06-06 17:01:49

Affected files: STEM/AI/Kalman Filter.md STEM/Signal Proc/Convolution.md STEM/Signal Proc/Image/Tracking.md STEM/Signal Proc/Pole-Zero.md STEM/Signal Proc/Transfer Function.md STEM/Speech/Linguistics/Consonants.md STEM/Speech/Linguistics/Linguistics.md STEM/Speech/Linguistics/README.md STEM/Speech/Linguistics/Terms.md STEM/Speech/Linguistics/Vowels.md STEM/Speech/NLP/Jargon.md STEM/Speech/NLP/NLP.md STEM/Speech/NLP/README.md STEM/Speech/NLP/Recognition.md STEM/Speech/Perception/Perception.md STEM/Speech/Perception/README.md STEM/Speech/Speech Processing/Applications.md STEM/Speech/Speech Processing/README.md STEM/Speech/Speech Processing/Source-Filter.md STEM/Speech/Speech Processing/Vocal Tract.md STEM/img/english-phoneme-table.png STEM/img/formant.png STEM/img/pole-zero-attenuation.png STEM/img/pole-zero-feedback.png STEM/img/pole-zero-stable.png STEM/img/roc-right-left.png STEM/img/roc-two-sided.png STEM/img/spectrum-vocal-tract.png STEM/img/transfer-stable-unstable.png STEM/img/vowel-chart.png STEM/img/vowel-spaces.png
2023-06-06 17:01:49 +01:00 · 2023-06-06 17:01:49 +01:00 · 5a94c5ff1a
commit 5a94c5ff1a
parent 7bc4dffd8b
31 changed files with 523 additions and 2 deletions
--- a/Filter.md
+++ b/Filter.md
@ -0,0 +1,9 @@
+- Measure
+- Predict
+- Update
+
+- Positions and confidences modelled as gaussians
+- Mean
+	- Position
+- Variance
+	- Confidence
--- a/Proc/Convolution.md
+++ b/Proc/Convolution.md
@ -14,13 +14,22 @@ $$x(t)=x_1(t)\circledast x_2(t)=\int_{-\infty}^\infty x_1(t-\tau)\cdot x_2(\tau)
 4. $Ax_1(t)\circledast Bx_2(t)=AB[x_1(t)\circledast x_2(t)]$
 	- Associativity with Scalar
 5. Symmetrical graph about origin
+6. $y(t)=x_1(t-a)\circledast x_2(t-b)$
+	- $x(t)=x_1(t)\circledast x_2(t)$
+	- $y(t)=x(t-a-b)$
+7. $x(t)=x_1(t)\circledast x_2(t)$
+	- $x_1$ between $a_1$ and $b_1$
+	- $x_2$ between $a_2$ and $b_2$
+	- Starting point of $x(t)=a_1+a_2$
+	- Ending point of $x(t)=b_1+b_2$
+8. $\overline{x \circledast y}=\bar x \circledast \bar y$
+9. $(x \circledast y)'=x'\circledast y=x\circledast y'$

 # Applications
-
 1. Communications systems
 	- Shift signal in frequency domain (Frequency modulation)
 2. System analysis
-	- Find system output given input and transfer function
+	- Find system output given input and [transfer function](Transfer%20Function.md)

 # Polynomial Multiplication
 -   Convolving coefficients of two poly gives coefficients of product
--- a/Proc/Image/Tracking.md
+++ b/Proc/Image/Tracking.md
@ -0,0 +1,42 @@
+# Challenges
+- Clutter
+	- Distractors
+- Occlusion
+	- Hidden by other objects
+
+- Objects appearance can evolve
+	- Rotation, scale, camera viewpoint
+- Take new template every n frames
+- Take new template when confidence falls below threshold
+
+# Background Tracking
+- Static camera
+	- Capture clean shots of background
+- Object present
+	- Average enough footage
+- Background image - current video frame = difference image
+	- Threshold for binary mask
+
+# Nearest Neighbour Tracking
+- Decide component with closest centroid using previous centroid
+- Not good for occlusion
+	- Will snap to next candidate
+
+# Blob Tracking
+- Build colour model of object
+	- Eigenmodel
+- Mask of pixels that match object
+- Use centroid as location over time
+- Pick connected component with centroid closest to previous location
+- Good for distinctive colours
+	- Not for practical situations though
+
+# Template Tracking
+- Sample distinctive patch from image
+- Search all positions in video for patch
+- Use cross-correlation
+- Illumination changes
+	- Brightness is uniform shift of greyscale values up or down
+	- Correlated to the mean pixel value
+	- Subtract means in template and frame to give invariance
+	- Normalised cross-correlation
--- a/Proc/Pole-Zero.md
+++ b/Proc/Pole-Zero.md
@ -0,0 +1,62 @@
+- Poles
+	- **X**
+	- Let $X(z) = inf$
+		- Let $1/X(z) = 0$
+	- Roots of denominator
+- Zeros
+	- **O**
+	- Let $X(z) = 0$
+	- Roots of numerator
+- In complex (Z for speech) domain
+
+[Magnitude Response From Pole/Zeros](https://www.youtube.com/watch?v=8jNjVkoZQCU)
+[MIT Pole Zero](https://web.mit.edu/2.14/www/Handouts/PoleZero.pdf)
+
+Representation of rational transfer function, identifies
+- Stability
+- Causal/Anti-causal system
+- ROC
+- Minimum phase/Non minimum phase
+
+![](../img/pole-zero-attenuation.png)
+![](../img/pole-zero-stable.png)
+![](../img/pole-zero-feedback.png)
+
+# BIBO Stable
+- All poles of H must lie within the unit circle of the plot
+- If we give an input less than a constant
+- Will get an output less than some constant
+
+# Region of Convergence
+- Depends on whether causal or anti-causal
+- Cannot contain poles
+	- Goes to infinity
+
+## Continuous
+1. If includes imaginary axis
+	- BIBO stable
+	- All poles must be left of i axis
+2. Rightwards from pole with largest real-part (not infinity)
+	- Causal
+3. Leftward from pole with smallest real-part (not -infinity)
+	- Anti-causal
+
+## Discrete
+1. If includes unit circle
+	- BIBO stable
+2. Outward from pole with largest (not infinite) magnitude
+	- Right-sided impulse response
+	- Causal (if no pole at infinity)
+3. Inward from pole with smallest (nonzero) magnitude
+	- Anti-causal
+
+![](../img/roc-right-left.png)
+![](../img/roc-two-sided.png)
+
+Sinusoidal when complex pair
+- $e^{-j\omega}$
+- Euler's for oscillating
+Exponential when on the axis
+- Decays, no $i$ in the exponent
+
+![](../img/transfer-stable-unstable.png)
--- a/Proc/Transfer
+++ b/Proc/Transfer
@ -0,0 +1,25 @@
+$$Y(s)=H(s)\cdot X(s)$$
+- $H(s)=\frac{Y(s)}{X(s)}=\frac{\mathcal L\{y(t)\}}{\mathcal L\{x(t)\}}$
+
+$$Y(z)=H(z)\cdot X(z)$$
+- $H(z)=\frac{Y(z)}{X(z)}=\frac{\mathcal Z\{y[n]\}}{\mathcal Z\{x[n]\}}$
+
+$$G(\omega)=\frac{|Y|}{|X|}=|H(j\omega)|$$
+- $H(j\omega)$, Frequency response
+
+$$\phi(\omega)=arg(Y)-arg(X)=arg\left(H\left(j\omega\right)\right)$$
+- $\phi(\omega)$, Phase shift
+
+$$\tau_\phi(\omega)=-\frac{\phi(\omega)}{\omega}$$
+- $\tau_\phi$, Phase delay
+- Frequency-dependent amount of delay introduced to the sinusoid by $H$
+
+$$\tau_g(\omega)=-\frac{d\phi(\omega)}{d\omega}$$
+- $\tau_g$, Group delay
+- Frequency-dependent amount of delay introduced to the envelope of the sinusoid by $H$
+
+[Partial Fractions](https://lpsa.swarthmore.edu/BackGround/PartialFraction/PartialFraction.html#Order_of_numerator_polynomial_is_not_less_than_that_of_the_denominator)
+[Partial Fractions for Laplace](https://lpsa.swarthmore.edu/LaplaceXform/InvLaplace/InvLaplaceXformPFE.html)
+[Inverse Z Transform](https://lpsa.swarthmore.edu/ZXform/InvZXform/InvZXform.html)
+
+[Discrete Time Systems:Impulse responses and convolution; An introduction to the Z-transform](https://homes.esat.kuleuven.be/~maapc/static/files/SYSTHEORY/Slides/Lecture5/Lecture5-Impulse%20responses%20and%20convolution%20layout.pdf)
--- a/Speech/Linguistics/Consonants.md
+++ b/Speech/Linguistics/Consonants.md
@ -0,0 +1,61 @@
+- Complete or partial closure of vocal tract
+- Voiced/Unvoiced
+
+# Nasal
+- Mouth closed and velum lowered
+- Vowel-like structure, weaker energy
+- Sound through nose
+- English has only voiced nasals
+- Can be identified by direction of formant movement
+- m, n, ng
+- Locations
+	- Labial
+		- Lips
+		- m
+	- Alveolar
+		- Palatal ridge
+		- n
+	- Velar
+		- Soft palate
+		- ng
+
+# Plosive
+- Closure which is released
+- Voiced or unvoiced
+	- Based on when voicing starts
+	- t, d
+- Trill
+- Tap or flap
+- Locations
+	- Labial
+		- Lips
+		- p, b
+	- Alveolar
+		- Palatal ridge
+		- t, d
+	- Velar
+		- Soft palate
+		- k, g
+
+# Fricative
+- Air forced through constriction
+- Jet of turbulence
+- Noisy
+- Voiced
+	- v, z
+- Unvoiced
+	- f, s
+- Frequency cut-off
+	- Inversely proportional to length of cavity in front of constriction
+- Sibilants
+	- Air forced over teeth
+	- s, z
+- Affricate
+	- Abrupt start to frication
+	- Ch, J
+
+# Approximant
+- Articulators move close, some perturbations
+- Vowel-like structure, weaker energy
+- Upward trajectory of formants
+- r, y, l , w
--- a/Speech/Linguistics/Linguistics.md
+++ b/Speech/Linguistics/Linguistics.md
@ -0,0 +1,32 @@
+- Phonetics
+	- Sound of language
+	- Acoustic result of speech articulation
+- Phonology
+	- How languages or dialects organise sounds
+	- Within and across languages
+	- Function of sound units in language
+		- How phonemes are used
+- Morphology
+	- Word structure
+- Syntax
+	- Sentence structure
+- Semantics
+	- Meaning of words or sentences
+- Pragmatics
+	- How context contributes to meaning
+	- Speech act theory
+- Discourse analysis
+	- How sentences form text
+
+# Phonetics vs Phonology
+- [Phoneme](Terms.md#Phoneme)
+	- Unit of sound structure
+		- With linguistic content
+	- Abstraction of set of sounds
+		- Allophones
+	- Phonology
+- Phone
+	- Segment of speech recording
+	- Single sound
+		- Single phoneme
+	- Phonetics
--- a/Speech/Linguistics/README.md
+++ b/Speech/Linguistics/README.md
@ -0,0 +1 @@
+Linguistics.md
--- a/Speech/Linguistics/Terms.md
+++ b/Speech/Linguistics/Terms.md
@ -0,0 +1,31 @@
+# Phoneme
+- Smallest unit of speech
+	- Distinguish words
+- Continuous speech is stream of different phonemes
+- English has 44
+- Consonants
+	- Voiced or unvoiced
+- Vowels
+	- All voiced
+
+# Voiced
+- Quasi-periodic vibration of vocal cords
+- Sound transmitted along vocal tract unimpeded
+- Air forced through glottis
+	- Quasi-periodic oscillation
+
+# Unvoiced
+- Air flow through vocal apparatus is either cut-off or impeded
+	- Constriction using tongue or lips
+- Turbulence or noise
+	- Fricatives
+		- \\s\\, \\f\
+	- Plosives
+		- \\p\\, \\t\
+- Modelled as random noise
+
+# Formant
+- Vocal tract can be modelled as resonant system
+- Modes or peaks in spectral response of resonant system
+- Lowest 3 most important in speech
+![](../../img/formant.png)
--- a/Speech/Linguistics/Vowels.md
+++ b/Speech/Linguistics/Vowels.md
--- a/Speech/NLP/Jargon.md
+++ b/Speech/NLP/Jargon.md
@ -0,0 +1,26 @@
+- Types
+	- Distinct words
+	- |V|
+- Tokens
+	- All words
+	- N
+	- Related quantities
+		- Have equations that estimate
+- Disfluencies
+	- Fragments
+		- Broken/half spoken words
+	- Fillers
+		- Oo
+		- Uh
+		- May strip
+		- May be helpful
+			- Speaker may start clause again
+- Clitic
+	- Part of a word that can’t stand on its own
+	- What’re
+	- Contractions
+
+# Edit Distance
+- Distance between words
+- Number of insertions deletions
+- Similarity
--- a/Speech/NLP/NLP.md
+++ b/Speech/NLP/NLP.md
@ -0,0 +1,29 @@
+
+# Text Normalisation
+- Tokenisation
+	- Labelling parts of sentence
+	- Usually words
+	- Can be multiple
+		- Proper nouns
+		- New York
+		- Emoticons
+		- Hashtags
+	- May need some named entity recognition
+	- Penn Treebank standard
+	- Byte-pair encoding
+		- Standard can’t understand unseen words
+		- Encode as subwords
+			- -est, -er
+- Lemmatisation
+	- Determining roots of words
+	- Verb infinitives
+	- Find lemma
+		- Derived forms are inflections or inflected
+			- Word-forms
+	- Critical for morphological complex languages
+		- Arabic
+- Stemming
+	- Simpler than lemmatisation
+	- Just removing suffixes
+- Normalising word formats
+- Segmenting sentences
--- a/Speech/NLP/README.md
+++ b/Speech/NLP/README.md
@ -0,0 +1 @@
+NLP.md
--- a/Speech/NLP/Recognition.md
+++ b/Speech/NLP/Recognition.md
@ -0,0 +1,58 @@
+1. Automatic Speech Recognition
+	- Spoken words to machine-readable form
+2. Natural language understanding
+	- High level cognitive interpretation
+		- Structure
+		- Meaning
+		- Intention
+
+# Automatic Speech Recognition
+## Applications
+- Business/desktop apps
+	- Dictation
+	- Voice commands
+- Voice enabled services/apps
+	- Siri
+- Home automation
+- Game & Entertainment
+- Education
+- Speech therapy/Rehab
+- Hearing assistance
+	- Live CC
+
+## Challenges
+- Speaker dependency
+	- Accent
+	- Emotion
+- Vocab size
+	- Slang
+- Isolated words vs Continuous speech
+	- Hard to segment continuous speech
+- Language constraints & Knowledge sources
+	- Training source is critical
+- Acoustic ambiguity
+	- Similar sounding speech
+- Noise robustness
+	- Background noise
+	- Reverberation
+
+# Speech Diarisation
+- Who speaks when?
+- Split stream into homogenous segments for identity
+- Structure stream into speaker turns
+- Provide speaker identity
+- Combination of
+	- Speaker segmentation
+		- Speaker changes in stream
+	- Speaker clustering
+		- Grouping segments together on basis of characteristics
+- Gaussian mixture model
+	- HMM
+- Bottom-up
+	- More popular
+	- Succession of clusters
+	- Merge redundant clusters
+		- Remaining belong to speakers
+- Top-down
+	- Single cluster
+	- Iteratively split until speaker clusters
--- a/Speech/Perception/Perception.md
+++ b/Speech/Perception/Perception.md
@ -0,0 +1,8 @@
+# Physiological
+- Physical/mechanical processing of sound
+- Ear stuff
+
+# Psychological
+- Brain function and processing
+
+***Psychoacoustics incorporates both***
--- a/Speech/Perception/README.md
+++ b/Speech/Perception/README.md
@ -0,0 +1 @@
+Perception.md
--- a/Processing/Applications.md
+++ b/Processing/Applications.md
@ -0,0 +1,25 @@
+- Speech telecommunications & Encoding
+	- Preserving perceptibility and quality over the wire
+	- Minimising bandwidth
+- Speech enhancement
+	- Restoration of degraded speech
+		- Additive noise
+		- Reverberation
+		- Echoes
+		- Background sounds
+	- Blind source separation
+	- Adaptive filtering
+	- Spectral subtraction
+- Speech & Speaker recognition
+	- Auto conversion of speech to written
+	- Identifying speaker based on speech
+	- Dictation, speaker recognition for security
+- Speaker diarisation
+	- "Who speaks when"
+	- Speaker segmentation
+		- Speaker change point in stream
+	- Speaker clustering
+		- Grouping segments based on speaker identity
+- Speech synthesis
+- Speech analysis
+	- Waveform & spectrum
--- a/Processing/README.md
+++ b/Processing/README.md
@ -0,0 +1 @@
+Applications.md
--- a/Processing/Source-Filter.md
+++ b/Processing/Source-Filter.md
--- a/Processing/Vocal
+++ b/Processing/Vocal
@ -0,0 +1,16 @@
+- Input and output signals are real
+	- Filter coefficients are real for rational $H(z)$
+	- Poles/zeros either real or complex conjugate pairs
+- BIBO stability important
+
+# Frequency Response
+
+- Sample along unit circle
+	- $|z|=\left|e^{i\omega}\right|=1$
+
+# Magnitude Response
+$$\left|H(e^{i\omega})\right|=\frac{b_0|e^{i\omega}-\beta_1|\cdot\cdot\cdot|e^{i\omega}-\beta_q|}{|e^{i\omega}-\alpha_1|\cdot\cdot\cdot|e^{i\omega}-\alpha_p|}$$
+
+![](../../img/spectrum-vocal-tract.png)
+- LPC & Cepstral Analysis to separate
+- Residual allows more accurate estimation of pitch period
--- a/img/english-phoneme-table.png
+++ b/img/english-phoneme-table.png
--- a/img/formant.png
+++ b/img/formant.png
--- a/img/pole-zero-attenuation.png
+++ b/img/pole-zero-attenuation.png
--- a/img/pole-zero-feedback.png
+++ b/img/pole-zero-feedback.png
--- a/img/pole-zero-stable.png
+++ b/img/pole-zero-stable.png
--- a/img/roc-right-left.png
+++ b/img/roc-right-left.png
--- a/img/roc-two-sided.png
+++ b/img/roc-two-sided.png
--- a/img/spectrum-vocal-tract.png
+++ b/img/spectrum-vocal-tract.png
--- a/img/transfer-stable-unstable.png
+++ b/img/transfer-stable-unstable.png
--- a/img/vowel-chart.png
+++ b/img/vowel-chart.png
--- a/img/vowel-spaces.png
+++ b/img/vowel-spaces.png