-   Feed-forward
-   Single hidden layer can learn any function
	-   Universal approximation theorem
-   Each hidden layer can operate as a different feature extraction layer
-   Lots of weights to learn
-   [[Back-Propagation]] is supervised

![[mlp-arch.png]]

# Universal Approximation Theory
A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function
-   In practice not trainable with [[Back-Propagation|BP]]

![[activation-function.png]]
![[mlp-arch-diagram.png]]