diff --git a/AI/Neural Networks/MLP.md b/AI/Neural Networks/MLP.md index d077d1c..7d548b3 100644 --- a/AI/Neural Networks/MLP.md +++ b/AI/Neural Networks/MLP.md @@ -3,13 +3,13 @@ - Universal approximation theorem - Each hidden layer can operate as a different feature extraction layer - Lots of weights to learn -- Backpropagation is supervised +- [[Back-Propagation]] is supervised ![[mlp-arch.png]] # Universal Approximation Theory A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function -- In practice not trainable with BP +- In practice not trainable with [[Back-Propagation|BP]] ![[activation-function.png]] ![[mlp-arch-diagram.png]] \ No newline at end of file diff --git a/AI/Neural Networks/MLP/Activation Functions.md b/AI/Neural Networks/MLP/Activation Functions.md new file mode 100644 index 0000000..e69de29 diff --git a/AI/Neural Networks/MLP/Back-Propagation.md b/AI/Neural Networks/MLP/Back-Propagation.md index 8016138..a4035e7 100644 --- a/AI/Neural Networks/MLP/Back-Propagation.md +++ b/AI/Neural Networks/MLP/Back-Propagation.md @@ -3,7 +3,104 @@ Error signal graph ![[mlp-arch-graph.png]] 1. Error Signal + - $e_j(n)=d_j(n)-y_j(n)$ 2. Net Internal Sum + - $v_j(n)=\sum_{i=0}^mw_{ji}(n)y_i(n)$ 3. Output + - $y_j(n)=\varphi_j(v_j(n))$ 4. Instantaneous Sum of Squared Errors -5. Average Squared Error \ No newline at end of file + - $\mathfrak{E}(n)=\frac 1 2 \sum_{j\in C}e_j^2(n)$ + - $C$ = o/p layer nodes +5. Average Squared Error + - $\mathfrak E_{av}=\frac 1 N\sum_{n=1}^N\mathfrak E (n)$ + +$$\frac{\partial\mathfrak E(n)}{\partial w_{ji}(n)}= +\frac{\partial\mathfrak E(n)}{\partial e_j(n)} +\frac{\partial e_j(n)}{\partial y_j(n)} +\frac{\partial y_j(n)}{\partial v_j(n)} +\frac{\partial v_j(n)}{\partial w_{ji}(n)} +$$ + +#### From 4 +$$\frac{\partial\mathfrak E(n)}{\partial e_j(n)}= +e_j(n)$$ +#### From 1 +$$\frac{\partial e_j(n)}{\partial y_j(n)}=-1$$ +#### From 3 (note prime) +$$\frac{\partial y_j(n)}{\partial v_j(n)}= +\varphi_j'(v_j(n))$$ +#### From 2 +$$\frac{\partial v_j(n)}{\partial w_{ji}(n)}= +y_i(n)$$ + +## Composite +$$\frac{\partial\mathfrak E(n)}{\partial w_{ji}(n)}= +-e_j(n)\cdot +\varphi_j'(v_j(n))\cdot +y_i(n) +$$ + +$$\Delta w_{ji}(n)= +-\eta\frac{\partial\mathfrak E(n)}{\partial w_{ji}(n)}$$ +$$\Delta w_{ji}(n)= +\eta\delta_j(n)y_i(n)$$ +## Gradients +#### Output +$$\delta_j(n)=-\frac{\partial\mathfrak E (n)}{\partial v_j(n)}$$ +$$=- +\frac{\partial\mathfrak E(n)}{\partial e_j(n)} +\frac{\partial e_j(n)}{\partial y_j(n)} +\frac{\partial y_j(n)}{\partial v_j(n)}$$ +$$= +e_j(n)\cdot +\varphi_j'(v_j(n)) +$$ + +#### Local +$$\delta_j(n)=- +\frac{\partial\mathfrak E (n)}{\partial y_j(n)} +\frac{\partial y_j(n)}{\partial v_j(n)}$$ +$$=- +\frac{\partial\mathfrak E (n)}{\partial y_j(n)} +\cdot +\varphi_j'(v_j(n))$$ +$$\delta_j(n)= +\varphi_j'(v_j(n)) +\cdot +\sum_k \delta_k(n)\cdot w_{kj}(n)$$ + +## Weight Correction +$$\text{weight correction = learning rate $\cdot$ local gradient $\cdot$ input signal of neuron $j$}$$ +$$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$ + +- Looking for partial derivative of error with respect to each weight +- 4 partial derivatives + 1. Sum of squared errors WRT error in one output node + 2. Error WRT output $y$ + 3. Output Y WRT Pre-activation function sum + 4. Pre-activation function sum WRT weight + - Other weights constant, goes to zero + - Leaves just $y_i$ + - Collect 3 boxed terms as delta $j$ + - Local gradient +- Weight correction can be too slow raw + - Gets stuck + - Add momentum + +![[mlp-local-hidden-grad.png]] + +- Nodes further back + - More complicated + - Sum of later local gradients multiplied by backward weight (orange) + - Multiplied by differential of activation function at node + +## Global Minimum +- Much more complex error surface than least-means-squared +- No guarantees of convergence + - Non-linear optimisation +- Momentum + - $+\alpha\Delta w_{ji}(n-1), 0\leq|\alpha|<1$ + - Proportional to the change in weights last iteration + - Can shoot past local minima if descending quickly + +![[mlp-global-minimum.png]] \ No newline at end of file diff --git a/AI/Neural Networks/SLP.md b/AI/Neural Networks/SLP.md index ce9c89a..d12b757 100644 --- a/AI/Neural Networks/SLP.md +++ b/AI/Neural Networks/SLP.md @@ -4,4 +4,4 @@ $$=w^T(n)x(n)$$ ![[slp-hyperplane.png]] Perceptron learning is performed for a finite number of iteration and then stops -LMS is continuous learning that doesn't stop \ No newline at end of file +[[Least Mean Square|LMS]] is continuous learning that doesn't stop \ No newline at end of file diff --git a/img/mlp-global-minimum.png b/img/mlp-global-minimum.png new file mode 100644 index 0000000..5b524cf Binary files /dev/null and b/img/mlp-global-minimum.png differ diff --git a/img/mlp-local-hidden-grad.png b/img/mlp-local-hidden-grad.png new file mode 100644 index 0000000..1545f3c Binary files /dev/null and b/img/mlp-local-hidden-grad.png differ