- [Feedforward](../Architectures.md) - Single hidden layer can learn any function - Universal approximation theorem - Each hidden layer can operate as a different feature extraction layer - Lots of [weights](../Weight%20Init.md) to learn - [Back-Propagation](Back-Propagation.md) is [supervised](../../Learning.md#Supervised) ![mlp-arch](../../../img/mlp-arch.png) # Universal Approximation Theory A finite [feedforward](../Architectures.md) MLP with 1 hidden layer can in theory approximate any mathematical function - In practice not trainable with [BP](Back-Propagation.md) ![activation-function](../../../img/activation-function.png) ![mlp-arch-diagram](../../../img/mlp-arch-diagram.png) ## Weight Matrix - Use matrix multiplication for layer output - TLU is hard limiter ![tlu](../../../img/tlu.png) - $o_1$ to $o_4$ must all be one to overcome -3.5 bias and force output to 1 ![mlp-non-linear-decision](../../../img/mlp-non-linear-decision.png) - Can generate a non-linear [decision boundary](Decision%20Boundary.md)