diff --git a/AI/Neural Networks/Architectures.md b/AI/Neural Networks/Architectures.md
new file mode 100644
index 0000000..8703826
--- /dev/null
+++ b/AI/Neural Networks/Architectures.md	
@@ -0,0 +1,23 @@
+# Single-Layer Feedforward
+- *Acyclic*
+- Count output layer, no computation at input
+
+![[feedforward.png]]
+
+# Multilayer Feedforward
+- Hidden layers
+	- Extract higher-order statistics
+	- Global perspective
+	- Helpful with large input layer
+- Fully connected
+	- Every neuron is connected to every neuron adjacent layers
+- Below is a 10-4-2 network
+![[multilayerfeedforward.png]]
+
+# Recurrent
+- At least one feedback loop
+- Below has no self-feedback
+![[recurrent.png]]
+![[recurrentwithhn.png]]
+
+- Above has hidden neurons
\ No newline at end of file
diff --git a/AI/Neural Networks/CNN/CNN.md b/AI/Neural Networks/CNN/CNN.md
index 5a8f82a..ad0d735 100644
--- a/AI/Neural Networks/CNN/CNN.md	
+++ b/AI/Neural Networks/CNN/CNN.md	
@@ -14,14 +14,14 @@
 	- Double digit % gain on ImageNet accuracy
 
 # Full Connected
-Dense
+[[MLP|Dense]]
 - Move from convolutional operations towards vector output
 - Stochastic drop-out
-	- Sub-sample channels and only connect some to dense layers
+	- Sub-sample channels and only connect some to [[MLP|dense]] layers
 
 # As a Descriptor
 - Most powerful as a deeply learned feature extractor
-- Dense classifier at the end isn't fantastic
+- [[MLP|Dense]] classifier at the end isn't fantastic
 	- Use SVM to classify prior to penultimate layer
 
 ![[cnn-descriptor.png]]
@@ -42,13 +42,13 @@ Dense
 
 ![[fine-tuning-freezing.png]]
 # Training
-- Validation & training loss
+- Validation & training [[Deep Learning#Loss Function|loss]]
 - Early
 	- Under-fitting
 	- Training not representative
 - Later
 	- Overfitting
-- V.loss can help adjust learning rate
+- V.[[Deep Learning#Loss Function|loss]] can help adjust learning rate
 	- Or indicate when to stop training
 
 ![[under-over-fitting.png]]
\ No newline at end of file
diff --git a/AI/Neural Networks/CNN/Examples.md b/AI/Neural Networks/CNN/Examples.md
index b1b3e26..6b78f4f 100644
--- a/AI/Neural Networks/CNN/Examples.md	
+++ b/AI/Neural Networks/CNN/Examples.md	
@@ -29,13 +29,13 @@
 2015
 
 - [[Inception Layer]]s
-- Multiple Loss Functions
+- Multiple [[Deep Learning#Loss Function|Loss]] Functions
 
 ![[googlenet.png]]
 
 ## [[Inception Layer]]
 ![[googlenet-inception.png]]
-## Auxiliary Loss Functions
+## Auxiliary [[Deep Learning#Loss Function|Loss]] Functions
 - Two other SoftMax blocks
 - Help train really deep network
 	- Vanishing gradient problem
diff --git a/AI/Neural Networks/CNN/FCN/FCN.md b/AI/Neural Networks/CNN/FCN/FCN.md
index 797cf4f..969920b 100644
--- a/AI/Neural Networks/CNN/FCN/FCN.md	
+++ b/AI/Neural Networks/CNN/FCN/FCN.md	
@@ -20,13 +20,13 @@ Contractive → [[UpConv]]
 - Rarely from scratch
 - Pre-trained weights
 - Replace final layers
-	- FC layers
+	- [[MLP|FC]] layers
 	- White-noise initialised
 - Add [[upconv]] layer(s)
 	- Fine-tune train
 	- Freeze others
 	- Annotated GT images
-- Can use summed per-pixel log loss
+- Can use summed per-pixel log [[Deep Learning#Loss Function|loss]]
 
 # Evaluation
 ![[fcn-eval.png]]
diff --git a/AI/Neural Networks/CNN/GAN/DC-GAN.md b/AI/Neural Networks/CNN/GAN/DC-GAN.md
index c0c4b07..e096b8d 100644
--- a/AI/Neural Networks/CNN/GAN/DC-GAN.md	
+++ b/AI/Neural Networks/CNN/GAN/DC-GAN.md	
@@ -12,11 +12,11 @@ Deep Convolutional [[GAN]]
 	- Train using Gaussian random noise for code
 - Discriminator
 	- Contractive
-	- Cross-entropy loss
+	- Cross-entropy [[Deep Learning#Loss Function|loss]]
 	- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
-	- Normalised output via sigmoid
+	- Normalised output via [[Activation Functions#Sigmoid|sigmoid]]
 
-## Loss
+## [[Deep Learning#Loss Function|Loss]]
 $$D(S,L)=-\sum_iL_ilog(S_i)$$
 - $S$
 	- $(0.1, 0.9)^T$
diff --git a/AI/Neural Networks/CNN/GAN/GAN.md b/AI/Neural Networks/CNN/GAN/GAN.md
index ac77d59..93c3553 100644
--- a/AI/Neural Networks/CNN/GAN/GAN.md	
+++ b/AI/Neural Networks/CNN/GAN/GAN.md	
@@ -1,7 +1,7 @@
 # Fully Convolutional
 - Remove [[Max Pooling]]
 	- Use strided [[upconv]]
-- Remove FC layers
+- Remove [[MLP|FC]] layers
 	- Hurts convergence in non-classification
 - Normalisation tricks
 	- Batch normalisation
diff --git a/AI/Neural Networks/CNN/Interpretation.md b/AI/Neural Networks/CNN/Interpretation.md
index 515dd34..081ce35 100644
--- a/AI/Neural Networks/CNN/Interpretation.md	
+++ b/AI/Neural Networks/CNN/Interpretation.md	
@@ -6,8 +6,8 @@
 ![[am.png]]
 - **Use trained network**
 	- Don't update weights
-- Feedforward noise
-	- [[Back-Propagation|Back-propagate]] loss
+- [[Architectures|Feedforward]] noise
+	- [[Back-Propagation|Back-propagate]] [[Deep Learning#Loss Function|loss]]
 		- Don't update weights
 		- Update image
 
@@ -17,4 +17,4 @@
 - Prone to high frequency noise
 	- Minimise
 - Total variation
-	- $x^*$ is the best solution to minimise loss
\ No newline at end of file
+	- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
\ No newline at end of file
diff --git a/AI/Neural Networks/Deep Learning.md b/AI/Neural Networks/Deep Learning.md
index 61a3e5f..d3bdbd9 100644
--- a/AI/Neural Networks/Deep Learning.md	
+++ b/AI/Neural Networks/Deep Learning.md	
@@ -8,7 +8,7 @@ Objective Function
 ![[deep-loss-function.png]]
 
 - Test accuracy worse than train accuracy = overfitting
-- Dense = fully connected
+- [[MLP|Dense]] = [[MLP|fully connected]]
 - Automates feature engineering
 
 ![[ml-dl.png]]
diff --git a/AI/Neural Networks/MLP/MLP.md b/AI/Neural Networks/MLP/MLP.md
index 2d65cbd..21aedc6 100644
--- a/AI/Neural Networks/MLP/MLP.md	
+++ b/AI/Neural Networks/MLP/MLP.md	
@@ -1,4 +1,4 @@
--   Feed-forward
+-   [[Architectures|Feedforward]]
 -   Single hidden layer can learn any function
 	-   Universal approximation theorem
 -   Each hidden layer can operate as a different feature extraction layer
@@ -8,7 +8,7 @@
 ![[mlp-arch.png]]
 
 # Universal Approximation Theory
-A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function
+A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function
 -   In practice not trainable with [[Back-Propagation|BP]]
 
 ![[activation-function.png]]
diff --git a/AI/Neural Networks/SLP/Least Mean Square.md b/AI/Neural Networks/SLP/Least Mean Square.md
index 5e7de49..9e45812 100644
--- a/AI/Neural Networks/SLP/Least Mean Square.md	
+++ b/AI/Neural Networks/SLP/Least Mean Square.md	
@@ -20,7 +20,7 @@ $$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$
 $$\hat{g}(n)=-x(n)\cdot e(n)$$
 $$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$
 
--   Above is a feedback loop around weight vector, $\hat{w}$
+-   Above is a [[Architectures|feedforward]] loop around weight vector, $\hat{w}$
 	-   Behaves like low-pass filter
 		-   Pass low frequency components of error signal
 	-   Average time constant of filtering action inversely proportional to learning-rate
diff --git a/AI/Neural Networks/Transformers/Attention.md b/AI/Neural Networks/Transformers/Attention.md
index dbb611b..ba7c9ca 100644
--- a/AI/Neural Networks/Transformers/Attention.md	
+++ b/AI/Neural Networks/Transformers/Attention.md	
@@ -11,7 +11,7 @@
 - Attention layer access all previous states and weighs according to learned measure of relevance
 	- Allows referring arbitrarily far back to relevant tokens
 - Can be addd to [[RNN]]s
-- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a feedforward network
+- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
 	- Attention useful in of itself, not just with [[RNN]]s
 - [[Transformers]] use attention without recurrent connections
 	- Process all tokens simultaneously
diff --git a/AI/Neural Networks/Transformers/Transformers.md b/AI/Neural Networks/Transformers/Transformers.md
index 4cddb84..5c4e8ec 100644
--- a/AI/Neural Networks/Transformers/Transformers.md	
+++ b/AI/Neural Networks/Transformers/Transformers.md	
@@ -35,5 +35,5 @@
 	- Uses incorporated textual information to produce output
 	- Has attention to draw information from output of previous decoders before drawing from encoders
 - Both use [[attention]]
-- Both use dense layers for additional processing of outputs
+- Both use [[MLP|dense]] layers for additional processing of outputs
 	- Contain residual connections & layer norm steps
\ No newline at end of file
diff --git a/img/feedforward.png b/img/feedforward.png
new file mode 100644
index 0000000..4b7f456
Binary files /dev/null and b/img/feedforward.png differ
diff --git a/img/multilayerfeedforward.png b/img/multilayerfeedforward.png
new file mode 100644
index 0000000..c3004ab
Binary files /dev/null and b/img/multilayerfeedforward.png differ
diff --git a/img/recurrent.png b/img/recurrent.png
new file mode 100644
index 0000000..7b0ed82
Binary files /dev/null and b/img/recurrent.png differ
diff --git a/img/recurrentwithhn.png b/img/recurrentwithhn.png
new file mode 100644
index 0000000..88a44d4
Binary files /dev/null and b/img/recurrentwithhn.png differ