diff --git a/AI/Neural Networks/Activation Functions.md b/AI/Neural Networks/Activation Functions.md
index e446756..a8e5860 100644
--- a/AI/Neural Networks/Activation Functions.md	
+++ b/AI/Neural Networks/Activation Functions.md	
@@ -38,7 +38,7 @@ y_j(n)(1-y_j(n))$$
 - Nice derivative
 - Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$
 - Min value of 0 when $y_j=0$ or $1$
-- Initial weights chosen so not saturated at 0 or 1
+- Initial [[Weight Init|weights]] chosen so not saturated at 0 or 1
 
 If $y=\frac u v$
 Where $u$ and $v$ are differential functions
diff --git a/AI/Neural Networks/CNN/FCN/FlowNet.md b/AI/Neural Networks/CNN/FCN/FlowNet.md
index b9ef983..6516946 100644
--- a/AI/Neural Networks/CNN/FCN/FlowNet.md	
+++ b/AI/Neural Networks/CNN/FCN/FlowNet.md	
@@ -3,7 +3,7 @@ Optical Flow
 - 2-Channel optical flow
 	- $dx,dy$
 - Two consecutive frames
-	- 6-channel tensor
+	- 6-channel [[tensor]]
 
 ![[flownet.png]]
 
diff --git a/AI/Neural Networks/CNN/FCN/ResNet.md b/AI/Neural Networks/CNN/FCN/ResNet.md
new file mode 100644
index 0000000..886f426
--- /dev/null
+++ b/AI/Neural Networks/CNN/FCN/ResNet.md	
@@ -0,0 +1,25 @@
+- Residual networks
+- 152 layers
+- Skips every two layers
+	- Residual block
+- Later layers learning the identity function
+	- Skips help
+	- Deep network should be at least as good as shallower one by allowing some layers to do very little
+- Vanishing gradient
+	- Allows shortcut paths for gradients
+- Accuracy saturation
+	- Adding more layers to suitably deep network increases training error
+
+# Design
+
+- Skips across pairs of conv layers
+	- Elementwise addition
+- All layer 3x3 kernel
+- Spatial size halves each layer
+- Filters doubles each layer
+- Fully convolutional
+	- No fc layer
+	- No pooling
+		- Except at end
+	- No dropout
+
diff --git a/AI/Neural Networks/CNN/FCN/Skip Connections.md b/AI/Neural Networks/CNN/FCN/Skip Connections.md
new file mode 100644
index 0000000..9a01257
--- /dev/null
+++ b/AI/Neural Networks/CNN/FCN/Skip Connections.md	
@@ -0,0 +1,16 @@
+- Output of conv, c, layers are added to inputs of upconv, d, layers
+- Element-wise, not channel appending
+- Propagate high frequency information to later layers
+- Two types
+	- Additive
+		- Resnet
+		- Super-resolution auto-encoder
+- Concatenative
+	- Densely connected architectures
+	- DenseNet
+	- FlowNet
+
+![[skip-connections.png]]
+
+[AI Summer - Skip Connections](https://theaisummer.com/skip-connections/)
+[Arxiv - Visualising the Loss Landscape](https://arxiv.org/abs/1712.09913)aaaaa
\ No newline at end of file
diff --git a/AI/Neural Networks/CNN/GAN/DC-GAN.md b/AI/Neural Networks/CNN/GAN/DC-GAN.md
index e096b8d..d0a1cc2 100644
--- a/AI/Neural Networks/CNN/GAN/DC-GAN.md	
+++ b/AI/Neural Networks/CNN/GAN/DC-GAN.md	
@@ -7,7 +7,7 @@ Deep Convolutional [[GAN]]
 	- Generate image from code
 		- Low-dimensional
 			- ~100-D
-	- Reshape to tensor
+	- Reshape to [[tensor]]
 		- [[Upconv]] to image
 	- Train using Gaussian random noise for code
 - Discriminator
diff --git a/AI/Neural Networks/CNN/GAN/GAN.md b/AI/Neural Networks/CNN/GAN/GAN.md
index 93c3553..5ea2ea5 100644
--- a/AI/Neural Networks/CNN/GAN/GAN.md	
+++ b/AI/Neural Networks/CNN/GAN/GAN.md	
@@ -27,5 +27,5 @@
 
 # Code Vector Math for Control
 ![[cvmfc.png]]
-- Do AM to derive code for an image
+- Do [[Interpretation#Activation Maximisation|AM]] to derive code for an image
 ![[code-vector-math-for-control-results.png]]
\ No newline at end of file
diff --git a/AI/Neural Networks/CNN/Interpretation.md b/AI/Neural Networks/CNN/Interpretation.md
index 081ce35..b35fe90 100644
--- a/AI/Neural Networks/CNN/Interpretation.md	
+++ b/AI/Neural Networks/CNN/Interpretation.md	
@@ -17,4 +17,17 @@
 - Prone to high frequency noise
 	- Minimise
 - Total variation
-	- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
\ No newline at end of file
+	- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
+
+$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)$$
+- Won't work
+$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)+\lambda\mathcal R(x)$$
+- Need a regulariser like above
+
+![[am-regulariser.png]]
+
+$$\mathcal R_{V^\beta}(f)=\int_\Omega\left(\left(\frac{\partial f}{\partial u}(u,v)\right)^2+\left(\frac{\partial f}{\partial v}(u,v)\right)^2\right)^{\frac \beta 2}du\space dv$$
+
+$$\mathcal R_{V^\beta}(x)=\sum_{i,j}\left(\left(x_{i,j+1}-x_{ij}\right)^2+\left(x_{i+1,j}-x_{ij}\right)^2\right)^{\frac \beta 2}$$
+- Beta
+	- Degree of smoothing
\ No newline at end of file
diff --git a/AI/Neural Networks/Deep Learning.md b/AI/Neural Networks/Deep Learning.md
index d3bdbd9..9857edd 100644
--- a/AI/Neural Networks/Deep Learning.md	
+++ b/AI/Neural Networks/Deep Learning.md	
@@ -32,16 +32,16 @@ Predict
 Evaluate
 
 # Data Structure
-- Tensor flow = channels last
+- [[Tensor]] flow = channels last
 	- (samples, height, width, channels)
 - Vector data
-	- 2D tensors of shape (samples, features)   
+	- 2D [[tensor]]s of shape (samples, features)   
 - Time series data or sequence data
-	- 3D tensors of shape (samples, timesteps, features)   
+	- 3D [[tensor]]s of shape (samples, timesteps, features)   
 - Images
-	- 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, Width)
+	- 4D [[tensor]]s of shape (samples, height, width, channels) or (samples, channels, height, Width)
 - Video
-	- 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
+	- 5D [[tensor]]s of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
 
 ![[photo-tensor.png]]
 ![[matrix-dot-product.png]]
\ No newline at end of file
diff --git a/AI/Neural Networks/MLP/Back-Propagation.md b/AI/Neural Networks/MLP/Back-Propagation.md
index 20181bc..5d3b833 100644
--- a/AI/Neural Networks/MLP/Back-Propagation.md	
+++ b/AI/Neural Networks/MLP/Back-Propagation.md	
@@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
 	2.  Error WRT output $y$
 	3.  Output $y$ WRT Pre-activation function sum
 	4.  Pre-activation function sum WRT weight
-		-   Other weights constant, goes to zero
+		-   Other [[Weight Init|weights]] constant, goes to zero
 		-   Leaves just $y_i$
 	-   Collect 3 boxed terms as delta $j$
 		-   Local gradient
diff --git a/AI/Neural Networks/MLP/MLP.md b/AI/Neural Networks/MLP/MLP.md
index 21aedc6..eb51b04 100644
--- a/AI/Neural Networks/MLP/MLP.md	
+++ b/AI/Neural Networks/MLP/MLP.md	
@@ -2,7 +2,7 @@
 -   Single hidden layer can learn any function
 	-   Universal approximation theorem
 -   Each hidden layer can operate as a different feature extraction layer
--   Lots of weights to learn
+-   Lots of [[Weight Init|weights]] to learn
 -   [[Back-Propagation]] is supervised
 
 ![[mlp-arch.png]]
diff --git a/AI/Neural Networks/Transformers/Attention.md b/AI/Neural Networks/Transformers/Attention.md
index ba7c9ca..6cde5c4 100644
--- a/AI/Neural Networks/Transformers/Attention.md	
+++ b/AI/Neural Networks/Transformers/Attention.md	
@@ -19,7 +19,7 @@
 
 # Scaled Dot-Product
 - Calculate attention weights between all tokens at once
-- Learn 3 weight matrices
+- Learn 3 [[Weight Init|weight]] matrices
 	- Query
 		- $W_Q$
 	- Key
diff --git a/CS/ABI.md b/CS/ABI.md
index a3656d2..074d1a1 100644
--- a/CS/ABI.md
+++ b/CS/ABI.md
@@ -31,5 +31,5 @@
 # Embedded ABI
 - File format, data types, register usage, stack frame organisation, function parameter passing conventions
 	- For embedded OS
-- Compilers create object code compatible with code from other compilers
-	- Link libraries from different compilers
\ No newline at end of file
+- [[Compilers]] create object code compatible with code from other [[compilers]]
+	- Link libraries from different [[compilers]]
\ No newline at end of file
diff --git a/CS/Calling Conventions.md b/CS/Calling Conventions.md
index be3fd10..cf1cad8 100644
--- a/CS/Calling Conventions.md	
+++ b/CS/Calling Conventions.md	
@@ -5,15 +5,15 @@
 	- Also known as: callee-saved registers or non-volatile registers
 - How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
 
-Subtle differences between compilers, can be difficult to interface codes from different compilers
+Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]
 
 Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])
 
 # cdecl
 C declaration
 
-- Originally from Microsoft's C compiler
-	- Used by many C compilers for x86
+- Originally from Microsoft's C [[compilers|compiler]]
+	- Used by many C [[compilers]] for x86
 - Subroutine arguments passed on the stack
 - Function arguments pushed right-to-left
 	- Last pushed first
diff --git a/CS/Code Types.md b/CS/Code Types.md
index 962727e..848f500 100644
--- a/CS/Code Types.md	
+++ b/CS/Code Types.md	
@@ -1,16 +1,16 @@
 ## Machine Code
 -   Machine language instructions
--   Directly control CPU
+-   Directly control [[Processors|CPU]]
 -   Strictly numerical
 -   Lowest-level representation of a compiled or assembled program
 	-   Lowest-level visible to programmer
 	-   Internally microcode might used
 -   Hardware dependent
 -   Higher-level languages translated to machine code
-	-   Compilers, assemblers and linkers
+	-   [[Compilers]], assemblers and linkers
 	-   Not for interpreted code
 		-   Interpreter runs machine code
--   Assembly is effectively human readable machine code
+-   [[Assembly]] is effectively human readable machine code
 	-   Has mnemonics for opcodes etc
 
 ## Microcode
diff --git a/CS/Language Binding.md b/CS/Language Binding.md
index 3bda512..38c5369 100644
--- a/CS/Language Binding.md	
+++ b/CS/Language Binding.md	
@@ -24,5 +24,5 @@
 - Adobe Flash Player
 	- Tamarin
 - JVM
-- LLVM
+- [[Compilers#LLVM|LLVM]]
 - Silverlight
\ No newline at end of file
diff --git a/img/am-regulariser.png b/img/am-regulariser.png
new file mode 100644
index 0000000..cab63b4
Binary files /dev/null and b/img/am-regulariser.png differ
diff --git a/img/skip-connections.png b/img/skip-connections.png
new file mode 100644
index 0000000..6b68075
Binary files /dev/null and b/img/skip-connections.png differ