- Feed-forward - Single hidden layer can learn any function - Universal approximation theorem - Each hidden layer can operate as a different feature extraction layer - Lots of weights to learn - [[Back-Propagation]] is supervised ![[mlp-arch.png]] # Universal Approximation Theory A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function - In practice not trainable with [[Back-Propagation|BP]] ![[activation-function.png]] ![[mlp-arch-diagram.png]]