diff --git a/AI/Neural Networks/MLP.md b/AI/Neural Networks/MLP.md new file mode 100644 index 0000000..d077d1c --- /dev/null +++ b/AI/Neural Networks/MLP.md @@ -0,0 +1,15 @@ +- Feed-forward +- Single hidden layer can learn any function + - Universal approximation theorem +- Each hidden layer can operate as a different feature extraction layer +- Lots of weights to learn +- Backpropagation is supervised + +![[mlp-arch.png]] + +# Universal Approximation Theory +A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function +- In practice not trainable with BP + +![[activation-function.png]] +![[mlp-arch-diagram.png]] \ No newline at end of file diff --git a/AI/Neural Networks/MLP/Back-Propagation.md b/AI/Neural Networks/MLP/Back-Propagation.md new file mode 100644 index 0000000..8016138 --- /dev/null +++ b/AI/Neural Networks/MLP/Back-Propagation.md @@ -0,0 +1,9 @@ +Error signal graph + +![[mlp-arch-graph.png]] + +1. Error Signal +2. Net Internal Sum +3. Output +4. Instantaneous Sum of Squared Errors +5. Average Squared Error \ No newline at end of file diff --git a/AI/Neural Networks/SLP.md b/AI/Neural Networks/SLP.md new file mode 100644 index 0000000..ce9c89a --- /dev/null +++ b/AI/Neural Networks/SLP.md @@ -0,0 +1,7 @@ +![[slp-arch.png]] +$$v(n)=\sum_{i=0}^{m}w_i(n)x_i(n)$$ +$$=w^T(n)x(n)$$ +![[slp-hyperplane.png]] +Perceptron learning is performed for a finite number of iteration and then stops + +LMS is continuous learning that doesn't stop \ No newline at end of file diff --git a/AI/Neural Networks/SLP/Least Mean Square.md b/AI/Neural Networks/SLP/Least Mean Square.md new file mode 100644 index 0000000..5e7de49 --- /dev/null +++ b/AI/Neural Networks/SLP/Least Mean Square.md @@ -0,0 +1,81 @@ +- To handle overlapping classes +- Linearity condition remains + - Linear boundary +- No hard limiter + - Linear neuron +- Cost function changed to error, $J$ + - Half doesn’t matter for error + - Disappears when differentiating + +$$\mathfrak{E}(w)=\frac{1}{2}e^2(n)$$ +- Cost' w.r.t to weights +$$\frac{\partial\mathfrak{E}(w)}{\partial w}=e(n)\frac{\partial e(n)}{\partial w}$$ +- Calculate error, define delta +$$e(n)=d(n)-x^T(n)\cdot w(n)$$ +$$\frac{\partial e(n)}{\partial w(n)}=-x(n)$$ +$$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$ +- Gradient vector + - $g=\nabla\mathfrak{E}(w)$ + - Estimate via: +$$\hat{g}(n)=-x(n)\cdot e(n)$$ +$$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$ + +- Above is a feedback loop around weight vector, $\hat{w}$ + - Behaves like low-pass filter + - Pass low frequency components of error signal + - Average time constant of filtering action inversely proportional to learning-rate + - Small value progresses algorithm slowly + - Remembers more + - Inverse of learning rate is measure of memory of LMS algorithm +- $\hat{w}$ because it's an estimate of the weight vector that would result from steepest descent + - Steepest descent follows well-defined trajectory through weight space for a given learning rate + - LMS traces random trajectory + - Stochastic gradient algorithm + - Requires no knowledge of environmental statistics + +## Analysis + +- Convergence behaviour dependent on statistics of input vector and learning rate + - Another way is that for a given dataset, the learning rate is critical +- Convergence of the mean + - $E[\hat{w}(n)]\rightarrow w_0 \text{ as } n\rightarrow \infty$ + - Converges to Wiener solution + - Not helpful +- Convergence in the mean square + - $E[e^2(n)]\rightarrow \text{constant, as }n\rightarrow\infty$ +- Convergence in the mean square implies convergence in the mean + - Not necessarily converse + +## Advantages +- Simple +- Model independent + - Robust +- Optimal in accordance with $H^\infty$, minimax criterion + - _If you do not know what you are up against, plan for the worst and optimise_ +- ___Was___ considered an instantaneous approximation of gradient-descent + +## Disadvantages +- Slow rate of convergence +- Sensitivity to variation in eigenstructure of input +- Typically requires iterations of 10 x dimensionality of the input space + - Worse with high-d input spaces +![[slp-mse.png]] +- Use steepest descent +- Partial derivatives +![[slp-steepest-descent.png]] +- Can be solved by matrix inversion +- Stochastic + - Random progress + - Will overall improve + +![[lms-algorithm.png]] + +$$\hat{w}(n+1)=\hat{w}(n)+\eta\cdot x(n)\cdot[d(n)-x^T(n)\cdot\hat w(n)]$$ +$$=[I-\eta\cdot x(n)x^T(n)]\cdot\hat{w}(n)+\eta\cdot x(n)\cdot d(n)$$ + +Where +$$\hat w(n)=z^{-1}[\hat w(n+1)]$$ +## Independence Theory +![[slp-lms-independence.png]] + +![[sl-lms-summary.png]] \ No newline at end of file diff --git a/AI/Neural Networks/SLP/Perceptron Convergence.md b/AI/Neural Networks/SLP/Perceptron Convergence.md new file mode 100644 index 0000000..9fe78be --- /dev/null +++ b/AI/Neural Networks/SLP/Perceptron Convergence.md @@ -0,0 +1,42 @@ +Error-Correcting Perceptron Learning + +- Uses a McCulloch-Pitt neuron + - One with a hard limiter +- Unity increment + - Learning rate of 1 + +If the $n$-th member of the training set, $x(n)$, is correctly classified by the weight vector $w(n)$ computed at the $n$-th iteration of the algorithm, no correction is made to the weight vector of the perceptron in accordance with the rule: +$$w(n + 1) = w(n) \text{ if $w^Tx(n) > 0$ and $x(n)$ belongs to class $\mathfrak{c}_1$}$$ +$$w(n + 1) = w(n) \text{ if $w^Tx(n) \leq 0$ and $x(n)$ belongs to class $\mathfrak{c}_2$}$$ +Otherwise, the weight vector of the perceptron is updated in accordance with the rule +$$w(n + 1) = w(n) - \eta(n)x(n) \text{ if } w^Tx(n) > 0 \text{ and } x(n) \text{ belongs to class }\mathfrak{c}_2$$ +$$w(n + 1) = w(n) + \eta(n)x(n) \text{ if } w^Tx(n) \leq 0 \text{ and } x(n) \text{ belongs to class }\mathfrak{c}_1$$ + +1. _Initialisation_. Set $w(0)=0$. perform the following computations for +time step $n = 1, 2,...$ +2. _Activation_. At time step $n$, activate the perceptron by applying continuous-valued input vector $x(n)$ and desired response $d(n)$. +3. _Computation of Actual Response_. Compute the actual response of the perceptron: +$$y(n) = sgn[w^T(n)x(n)]$$ +where $sgn(\cdot)$ is the signum function. +4. _Adaptation of Weight Vector_. Update the weight vector of the perceptron: +$$w(n+1)=w(n)+\eta[d(n)-y(n)]x(n)$$ where +$$ +d(n) = \begin{cases} ++1 &\text{if $x(n)$ belongs to class $\mathfrak{c_1}$}\\ +-1 &\text{if $x(n)$ belongs to class $\mathfrak{c_2}$} +\end{cases} +$$ +5. _Continuation_. Increment time step $n$ by one and go back to step 2. + +- Guarantees convergence provided + - Patterns are linearly separable + - Non-overlapping classes + - Linear separation boundary + - Learning rate not too high +- Two conflicting requirements + 1. Averaging of past inputs to provide stable weight estimates + - Small eta + 2. Fast adaptation with respect to real changes in the underlying distribution of process responsible for $x$ + - Large eta + +![[slp-separable.png]] \ No newline at end of file diff --git a/img/activation-function.png b/img/activation-function.png new file mode 100644 index 0000000..6ddc96b Binary files /dev/null and b/img/activation-function.png differ diff --git a/img/lms-algorithm.png b/img/lms-algorithm.png new file mode 100644 index 0000000..6c4fe0c Binary files /dev/null and b/img/lms-algorithm.png differ diff --git a/img/mlp-arch-diagram.png b/img/mlp-arch-diagram.png new file mode 100644 index 0000000..6ab125e Binary files /dev/null and b/img/mlp-arch-diagram.png differ diff --git a/img/mlp-arch-graph.png b/img/mlp-arch-graph.png new file mode 100644 index 0000000..7238e89 Binary files /dev/null and b/img/mlp-arch-graph.png differ diff --git a/img/mlp-arch.png b/img/mlp-arch.png new file mode 100644 index 0000000..b301a14 Binary files /dev/null and b/img/mlp-arch.png differ diff --git a/img/sl-lms-summary.png b/img/sl-lms-summary.png new file mode 100644 index 0000000..9c014ce Binary files /dev/null and b/img/sl-lms-summary.png differ diff --git a/img/slp-arch.png b/img/slp-arch.png new file mode 100644 index 0000000..a2ae495 Binary files /dev/null and b/img/slp-arch.png differ diff --git a/img/slp-hyperplane.png b/img/slp-hyperplane.png new file mode 100644 index 0000000..98e1494 Binary files /dev/null and b/img/slp-hyperplane.png differ diff --git a/img/slp-lms-independence.png b/img/slp-lms-independence.png new file mode 100644 index 0000000..f96195e Binary files /dev/null and b/img/slp-lms-independence.png differ diff --git a/img/slp-mse.png b/img/slp-mse.png new file mode 100644 index 0000000..0f6ff41 Binary files /dev/null and b/img/slp-mse.png differ diff --git a/img/slp-separable.png b/img/slp-separable.png new file mode 100644 index 0000000..7da7070 Binary files /dev/null and b/img/slp-separable.png differ diff --git a/img/slp-steepest-descent.png b/img/slp-steepest-descent.png new file mode 100644 index 0000000..605b477 Binary files /dev/null and b/img/slp-steepest-descent.png differ