-   [[Architectures|Feedforward]]
-   Single hidden layer can learn any function
	-   Universal approximation theorem
-   Each hidden layer can operate as a different feature extraction layer
-   Lots of [[Weight Init|weights]] to learn
-   [[Back-Propagation]] is supervised

![[mlp-arch.png]]

# Universal Approximation Theory
A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function
-   In practice not trainable with [[Back-Propagation|BP]]

![[activation-function.png]]
![[mlp-arch-diagram.png]]
## Weight Matrix
-   Use matrix multiplication for layer output
-   TLU is hard limiter
![[tlu.png]]
- $o_1$ to $o_4$ must all be one to overcome -3.5 bias and force output to 1
![[mlp-non-linear-decision.png]]
- Can generate a non-linear [[Decision Boundary|decision boundary]]