vault backup: 2023-06-07 09:02:27

Affected files:
STEM/AI/Classification/Classification.md
STEM/AI/Classification/Decision Trees.md
STEM/AI/Classification/Gradient Boosting Machine.md
STEM/AI/Classification/Logistic Regression.md
STEM/AI/Classification/Random Forest.md
STEM/AI/Classification/Supervised.md
STEM/AI/Classification/Supervised/README.md
STEM/AI/Classification/Supervised/SVM.md
STEM/AI/Classification/Supervised/Supervised.md
STEM/AI/Learning.md
STEM/AI/Neural Networks/Learning/Boltzmann.md
STEM/AI/Neural Networks/Learning/Competitive Learning.md
STEM/AI/Neural Networks/Learning/Credit-Assignment Problem.md
STEM/AI/Neural Networks/Learning/Hebbian.md
STEM/AI/Neural Networks/Learning/Learning.md
STEM/AI/Neural Networks/Learning/README.md
STEM/AI/Neural Networks/RNN/Autoencoder.md
STEM/AI/Neural Networks/RNN/Deep Image Prior.md
STEM/AI/Neural Networks/RNN/MoCo.md
STEM/AI/Neural Networks/RNN/Representation Learning.md
STEM/AI/Neural Networks/RNN/SimCLR.md
STEM/img/comp-learning.png
STEM/img/competitive-geometric.png
STEM/img/confusion-matrix.png
STEM/img/decision-tree.png
STEM/img/deep-image-prior-arch.png
STEM/img/deep-image-prior-results.png
STEM/img/hebb-learning.png
STEM/img/moco.png
STEM/img/receiver-operator-curve.png
STEM/img/reinforcement-learning.png
STEM/img/rnn+autoencoder-variational.png
STEM/img/rnn+autoencoder.png
STEM/img/simclr.png
STEM/img/sup-representation-learning.png
STEM/img/svm-c.png
STEM/img/svm-non-linear-project.png
STEM/img/svm-non-linear-separated.png
STEM/img/svm-non-linear.png
STEM/img/svm-optimal-plane.png
STEM/img/svm.png
STEM/img/unsup-representation-learning.png
This commit is contained in:
andy 2023-06-07 09:02:27 +01:00
parent 5a94c5ff1a
commit 1513f2b378
42 changed files with 392 additions and 6 deletions

View File

@ -9,7 +9,7 @@
Argument that gives the maximum value from a target function
# Gaussian Classifier
[Training](Supervised.md)
[Training](Supervised/Supervised.md)
- Each class $i$ has it's own Gaussian $N_i=N(m_i,v_i)$
$$\hat i=\text{argmax}_i\left(p(o_t|N_i)\cdot P(N_i)\right)$$

View File

@ -0,0 +1,4 @@
- Flowchart like design
- Iterative decision making
![](../../img/decision-tree.png)

View File

@ -0,0 +1,7 @@
- Higher level take
- Iteratively train more models addressing weak points
- Well paired with decision trees
- Strictly outperform random forest most of the time
- Similar properties
- One of the best algorithm for dealing with non perceptual data
- XGBoost

View File

@ -0,0 +1,16 @@
“hello world”
Related to naïve bayes
- Statistical model
- Uses ***logistic function*** to model a ***categorical*** dependent variable
# Types
- Binary
- 2 classes
- Multinomial
- Multiple classes without ordering
- Categories
- Ordinal
- Multiple ordered classes
- Star rating

View File

@ -0,0 +1 @@
“Almost always the second best algorithm for any shallow ML task”

View File

@ -1,5 +0,0 @@
# Gaussian Classifier
- With $T$ labelled data
$$q_t(i)=$$

View File

@ -0,0 +1 @@
Supervised.md

View File

@ -0,0 +1,74 @@
[Towards Data Science: SVM](https://towardsdatascience.com/support-vector-machines-svm-c9ef22815589)
[Towards Data Science: SVM an overview](https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989)
- Dividing line between two classes
- Optimal hyperplane for a space
- Margin maximising hyperplane
- Can be used for
- Classification
- SVC
- Regression
- SVR
- Alternative to Eigenmodels for supervised classification
- For smaller datasets
- Hard to scale on larger sets
![](../../../img/svm.png)
- Support vector points
- Closest points to the hyperplane
- Lines to hyperplane are support vectors
- Maximise margin between classes
- Take dot product of test point with vector perpendicular to support vector
- Sign determines class
# Pros
- Linear or non-linear discrimination
- Effective in higher dimensions
- Effective when number of features higher than training examples
- Best for when classes are separable
- Outliers have less impact
# Cons
- Long time for larger datasets
- Doesnt do well when overlapping
- Selecting appropriate kernel
# Parameters
- C
- How smooth the decision boundary is
- Larger C makes more curvy
- ![](../../../img/svm-c.png)
- Gamma
- Controls area of influence for data points
- High gamma reduces influence of faraway points
# Hyperplane
$$\beta_0+\beta_1X_1+\beta_2X_2+\cdot\cdot\cdot+\beta_pX_p=0$$
- $p$-dimensional space
- If $X$ satisfies equation
- On plane
- Maximal margin hyperplane
- Perpendicular distance from each observation to given plane
- Best plane has highest distance
- If support vector points shift
- Plane shifts
- Hyperplane only depends on the support vectors
- Rest don't matter
![](../../../img/svm-optimal-plane.png)
# Linearly Separable
- Not linearly separable
![](../../../img/svm-non-linear.png)
- Add another dimension
- $z=x^2+y^2$
- Square of the distance of the point from the origin
![](../../../img/svm-non-linear-project.png)
- Now separable
- Let $z=k$
- $k$ is a constant
- Project linear separator back to 2D
- Get circle
![](../../../img/svm-non-linear-separated.png)

View File

@ -0,0 +1,23 @@
# Gaussian Classifier
- With $T$ labelled data
$$q_t(i)=
\begin{cases}
1 & \text{if class } i \\
0 & \text{otherwise}
\end{cases}$$
- Indicator function
- Mean parameter
$$\hat m_i=\frac{\sum_tq_t(i)o_t}{\sum_tq_t(i)}$$
- Variance parameter
$$\hat v_i=\frac{\sum_tq_t(i)(o_t-\hat m_i)^2}{\sum_tq_t(i)}$$
- Distribution weight
- Class prior
- $P(N_i)$
$$\hat c_i=\frac 1 T \sum_tq_t(i)$$
$$\hat \mu_i=\frac{\sum_{t=1}^Tq_t(i)o_t}{\sum_{t=1}^Tq_t(i)}$$
$$\hat\sum_i=\frac{\sum_{t=1}^Tq_t(i)(o_t-\mu_i)(o_t-\mu_i)^T}{\sum_{t=1}^Tq_t(i)}$$
- For K-dimensional

63
AI/Learning.md Normal file
View File

@ -0,0 +1,63 @@
# Supervised
- Dataset with inputs manually annotated for desired output
- Desired output = supervisory signal
- Manually annotated = ground truth
- Annotated correct categories
## Split data
- Training set
- Test set
***Don't test on training data***
## Top-K Accuracy
- Whether correct answer appears in the top-k results
## Confusion Matrix
Samples described by ***feature vector***
Dataset forms a matrix
![](../img/confusion-matrix.png)
# Un-Supervised
- No example outputs given, learns how to categorise
- No teacher or critic
## Harder
- Must identify relevant distinguishing features
- Must decide on number of categories
# Reinforcement Learning
- No teacher - critic instead
- Continued interaction with the environment
- Minimise a scalar performance index
![](../img/reinforcement-learning.png)
- Critic
- Converts primary reinforcement to heuristic reinforcement
- Both scalar inputs
- Delayed reinforcement
- System observes temporal sequence of stimuli
- Results in generation of heuristic reinforcement signal
- Minimise cost-to-go function
- Expectation of cumulative cost of actions taken over sequence of steps
- Instead of just immediate cost
- Earlier actions may have been good
- Identify and feedback to environment
- Closely related to dynamic programming
## Difficulties
- No teacher to provide desired response
- Must solve temporal credit assignment problem
- Need to know which actions were the good ones
# Fitting
- Over-fitting
- Classifier too specific to training set
- Can't adequately generalise
- Under-fitting
- Too general, not inferred enough detail
- Learns non-discriminative or non-desired pattern
# ROC
Receiver Operator Characteristic Curve
![](../img/receiver-operator-curve.png)

View File

@ -0,0 +1,30 @@
- Stochastic
- Recurrent structure
- Binary operation (+/- 1)
- Energy function
$$E=-\frac 1 2 \sum_j\sum_k w_{kj}x_kx_j$$
- $j\neq k$
- No self-feedback
- $x$ = neuron state
- Neurons randomly flip from $x$ to $-x$
$$P(x_k \rightarrow-x_k)=\frac 1 {1+e^{\frac{-\Delta E_k}{T}}}$$
- Energy change based on pseudo-temperature
- System will reach thermal equilibrium
- Delta E is the energy change resulting from the flip
- Visible and hidden neurons
- Visible act as interface between network and environment
- Hidden always operate freely
# Operation Modes
- Clamped
- Visible neurons are clamped onto specific states determined by environment
- Free-running
- All neurons able to operate freely
- $\rho_{kj}^+$ = Correlation between states while clamped
- $\rho_{kj}^-$ = Correlation between states while free
- Both exist between +/- 1
$$\Delta w_{kj}=\eta(\rho_{kj}^+-\rho_{kj}^-), \space j\neq k$$

View File

@ -0,0 +1,40 @@
- Only single output neuron fires
1. Set of homogeneous neurons with some randomly distributed synaptic weights
- Respond differently to given set of input patterns
2. Limit imposed on strength of each neuron
3. Mechanism to allow neurons to compete for right to respond to a given subset of inputs
- Only one output neuron active at a time
- Or only one neuron per group
- ***Winner-takes-all neuron***
![](../../../img/comp-learning.png)
- Lateral inhibition
- Neurons inhibit other neurons
- Winning neuron must have highest induced local field for given input pattern
- Winning neuron is squashed to 1
- Others are clamped to 0
$$y_k=
\begin{cases}
1 & \text{if } v_k > v_j \text{ for all } j,j\neq k \\
0 & \text{otherwise}
\end{cases}
$$
- Neuron has fixed amount of weight spread amongst input synapses
- Sums to 1
- Learn by shifting weights from inactive to active input nodes
- Each input node relinquishes some proportion of weight
- Distributed amongst active nodes
$$\Delta w_{kj}=
\begin{cases}
\eta(x_j-w_{kj}) & \text{if neuron $k$ wins the competition}\\
0 & \text{if neuron $k$ loses the competition}
\end{cases}$$
- Individual neurons learn to specialise on ensembles of similar patterns
- Feature detectors
![](../../../img/competitive-geometric.png)

View File

@ -0,0 +1,17 @@
- Assigning credit/blame for outcomes to each internal decision
- Loading Problem
- Loading a training set into the free parameters
- Important to any learning machine attempting to improve performance in situations involving temporally extended behaviour
Two Sub-problems:
- ***Temporal*** credit-assignment problem
- Assigning credit for **outcomes** to **actions**
- Involves time when actions that deserve credit were taken
- Relevant when many actions taken and want to know which one was responsible
- ***Structural*** credit-assignment problem
- Assigning credit for **actions** to **internal decisions**
- Involves internal structures of actions generated by system
- Relevant for identifying which component should have behaviour altered
- By how much
- Important in MLPs when there are many hidden neurons

View File

@ -0,0 +1,55 @@
*Time-dependent, highly local, strongly interactive*
- Oldest learning algorithm
- Increases synaptic efficiency as a function of the correlation between presynaptic and postsynaptic activities
1. If two neurons on either side of a synapse are activated simultaneously/synchronously, then the strength of that synapse is selectively increased
2. If two neurons on either side of a synapse are activated asynchronously, then that synapse is selectively weakened or eliminated
- Hebbian synapse
- Time-dependent
- Depends on times of pre/post-synaptic signals
- Local
- Interactive
- Depends on both sides of synapse
- True interaction between pre/post-synaptic signals
- Cannot make prediction from either one by itself
- Conjunctional or correlational
- Based on conjunction of pre/post-synaptic signals
- Conjunctional synapse
- Modification classifications
- Hebbian
- **Increases** strength with **positively** correlated pre/post-synaptic signals
- **Decreases** strength with **negatively** correlated pre/post-synaptic signals
- Anti-Hebbian
- **Decreases** strength with **positively** correlated pre/post-synaptic signals
- **Increases** strength with **negatively** correlated pre/post-synaptic signals
- Still Hebbian in nature, not in function
- Non-Hebbian
- Doesn't involve above correlations/time dependence etc
# Mathematically
$$\Delta w_{kj}(n)=F\left(y_k(n),x_j(n)\right)$$
- Generally
- All Hebbian
![](../../../img/hebb-learning.png)
## Hebb's Hypothesis
$$\Delta w_{kj}(n)=\eta y_k(n)x_j(n)$$
- Activity product rule
- Exponential growth until saturation
- No information stored
- Selectivity lost
## Covariance Hypothesis
$$\Delta w_{kj}(n)=\eta(x_j-\bar x)(y_k-\bar y)$$
- Characterised by perturbation from of pre/post-synaptic signals from their mean over a given time interval
- Average $x$ and $y$ constitute thresholds
- Intercept at y = y bar
- Similar to learning in the hippocampus
*Allows:*
1. Convergence to non-trivial state
- When x = x bar or y = y bar
2. Prediction of both synaptic potentiation and synaptic depression

View File

@ -0,0 +1,5 @@
*Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place*
1. The neural network is **stimulated** by an environment
2. The network undergoes **changes in its free parameters** as a result of this stimulation
3. The network **responds in a new way** to the environment as a result of the change in internal structure

View File

@ -0,0 +1 @@
Learning.md

View File

@ -0,0 +1,10 @@
- Sequence of strokes for sketching
- LSTM backbone
![](../../../img/rnn+autoencoder.png)
# Variational
- Learn mean and covariance to drive encoder stage
- Generate different outputs by sampling latent space
![](../../../img/rnn+autoencoder-variational.png)

View File

@ -0,0 +1,8 @@
- Overfitted to image
- Learn weights necessary to reconstruct from white noise
- Trained from scratch on single image
- Encodes prior for natural images
- De-noise images
![](../../../img/deep-image-prior-arch.png)
![](../../../img/deep-image-prior-results.png)

View File

@ -0,0 +1,13 @@
- Similar to SimCLR
- Rich set of negatives
- Sampled from previous epochs in queue
- Two function for pos/neg and anchor
- Pos/neg are delayed anchor weights
- Updated with momentum
- Two delay mechanisms
- Two encoder functions
- Negative encoder queue
![](../../../img/moco.png)
$$\theta_k\leftarrow m\theta_k+(1-m)\theta_q$$

View File

@ -0,0 +1,13 @@
# Unsupervised
- Auto-encoder FCN
- Learns bottleneck (latent) representation
- Information rich
- $f(.)$ is CNN encoding function
![](../../../img/unsup-representation-learning.png)
# Supervised
- Triplet loss
- Providing positive and negative requires supervision
- Two losses
![](../../../img/sup-representation-learning.png)

View File

@ -0,0 +1,10 @@
1. Data augmentation
- Crop patches from images in batch
- Add colour jitter
2. Within batch sample positive and negative
- Patches from same image are positive
- All other negative
3. MLP layer to compute loss instead of bottleneck embedding
- Head network for function of bottleneck
![](../../../img/simclr.png)

BIN
img/comp-learning.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

BIN
img/confusion-matrix.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 311 KiB

BIN
img/decision-tree.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 496 KiB

BIN
img/hebb-learning.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 207 KiB

BIN
img/moco.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

BIN
img/rnn+autoencoder.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

BIN
img/simclr.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

BIN
img/svm-c.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

BIN
img/svm-non-linear.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

BIN
img/svm-optimal-plane.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

BIN
img/svm.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 611 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB