vault backup: 2023-05-27 22:17:56

Affected files: .obsidian/graph.json .obsidian/workspace-mobile.json .obsidian/workspace.json STEM/AI/Neural Networks/Activation Functions.md STEM/AI/Neural Networks/CNN/FCN/FlowNet.md STEM/AI/Neural Networks/CNN/FCN/ResNet.md STEM/AI/Neural Networks/CNN/FCN/Skip Connections.md STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md STEM/AI/Neural Networks/CNN/GAN/GAN.md STEM/AI/Neural Networks/CNN/Interpretation.md STEM/AI/Neural Networks/Deep Learning.md STEM/AI/Neural Networks/MLP/Back-Propagation.md STEM/AI/Neural Networks/MLP/MLP.md STEM/AI/Neural Networks/Transformers/Attention.md STEM/CS/ABI.md STEM/CS/Calling Conventions.md STEM/CS/Code Types.md STEM/CS/Language Binding.md STEM/img/am-regulariser.png STEM/img/skip-connections.png
2023-05-27 22:17:56 +01:00 · 2023-05-27 22:17:56 +01:00 · 33ac3007bc
commit 33ac3007bc
parent acb7dc429e
17 changed files with 76 additions and 22 deletions
--- a/Networks/Activation
+++ b/Networks/Activation
@ -38,7 +38,7 @@ y_j(n)(1-y_j(n))$$
 - Nice derivative
 - Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$
 - Min value of 0 when $y_j=0$ or $1$
- Initial weights chosen so not saturated at 0 or 1
+- Initial [[Weight Init|weights]] chosen so not saturated at 0 or 1

 If $y=\frac u v$
 Where $u$ and $v$ are differential functions
--- a/Networks/CNN/FCN/FlowNet.md
+++ b/Networks/CNN/FCN/FlowNet.md
@ -3,7 +3,7 @@ Optical Flow
 - 2-Channel optical flow
 	- $dx,dy$
 - Two consecutive frames
-	- 6-channel tensor
+	- 6-channel [[tensor]]

 ![[flownet.png]]

--- a/Networks/CNN/FCN/ResNet.md
+++ b/Networks/CNN/FCN/ResNet.md
@ -0,0 +1,25 @@
+- Residual networks
+- 152 layers
+- Skips every two layers
+	- Residual block
+- Later layers learning the identity function
+	- Skips help
+	- Deep network should be at least as good as shallower one by allowing some layers to do very little
+- Vanishing gradient
+	- Allows shortcut paths for gradients
+- Accuracy saturation
+	- Adding more layers to suitably deep network increases training error
+
+# Design
+
+- Skips across pairs of conv layers
+	- Elementwise addition
+- All layer 3x3 kernel
+- Spatial size halves each layer
+- Filters doubles each layer
+- Fully convolutional
+	- No fc layer
+	- No pooling
+		- Except at end
+	- No dropout
+
--- a/Networks/CNN/FCN/Skip
+++ b/Networks/CNN/FCN/Skip
@ -0,0 +1,16 @@
+- Output of conv, c, layers are added to inputs of upconv, d, layers
+- Element-wise, not channel appending
+- Propagate high frequency information to later layers
+- Two types
+	- Additive
+		- Resnet
+		- Super-resolution auto-encoder
+- Concatenative
+	- Densely connected architectures
+	- DenseNet
+	- FlowNet
+
+![[skip-connections.png]]
+
+[AI Summer - Skip Connections](https://theaisummer.com/skip-connections/)
+[Arxiv - Visualising the Loss Landscape](https://arxiv.org/abs/1712.09913)aaaaa
--- a/Networks/CNN/GAN/DC-GAN.md
+++ b/Networks/CNN/GAN/DC-GAN.md
@ -7,7 +7,7 @@ Deep Convolutional [[GAN]]
 	- Generate image from code
 		- Low-dimensional
 			- ~100-D
-	- Reshape to tensor
+	- Reshape to [[tensor]]
 		- [[Upconv]] to image
 	- Train using Gaussian random noise for code
 - Discriminator
--- a/Networks/CNN/GAN/GAN.md
+++ b/Networks/CNN/GAN/GAN.md
@ -27,5 +27,5 @@

 # Code Vector Math for Control
 ![[cvmfc.png]]
- Do AM to derive code for an image
+- Do [[Interpretation#Activation Maximisation|AM]] to derive code for an image
 ![[code-vector-math-for-control-results.png]]
--- a/Networks/CNN/Interpretation.md
+++ b/Networks/CNN/Interpretation.md
@ -17,4 +17,17 @@
 - Prone to high frequency noise
 	- Minimise
 - Total variation
-	- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
+	- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
+
+$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)$$
+- Won't work
+$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)+\lambda\mathcal R(x)$$
+- Need a regulariser like above
+
+![[am-regulariser.png]]
+
+$$\mathcal R_{V^\beta}(f)=\int_\Omega\left(\left(\frac{\partial f}{\partial u}(u,v)\right)^2+\left(\frac{\partial f}{\partial v}(u,v)\right)^2\right)^{\frac \beta 2}du\space dv$$
+
+$$\mathcal R_{V^\beta}(x)=\sum_{i,j}\left(\left(x_{i,j+1}-x_{ij}\right)^2+\left(x_{i+1,j}-x_{ij}\right)^2\right)^{\frac \beta 2}$$
+- Beta
+	- Degree of smoothing
--- a/Networks/Deep
+++ b/Networks/Deep
@ -32,16 +32,16 @@ Predict
 Evaluate

 # Data Structure
- Tensor flow = channels last
+- [[Tensor]] flow = channels last
 	- (samples, height, width, channels)
 - Vector data
-	- 2D tensors of shape (samples, features)   
+	- 2D [[tensor]]s of shape (samples, features)   
 - Time series data or sequence data
-	- 3D tensors of shape (samples, timesteps, features)   
+	- 3D [[tensor]]s of shape (samples, timesteps, features)   
 - Images
-	- 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, Width)
+	- 4D [[tensor]]s of shape (samples, height, width, channels) or (samples, channels, height, Width)
 - Video
-	- 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
+	- 5D [[tensor]]s of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)

 ![[photo-tensor.png]]
 ![[matrix-dot-product.png]]
--- a/Networks/MLP/Back-Propagation.md
+++ b/Networks/MLP/Back-Propagation.md
@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
 	2.  Error WRT output $y$
 	3.  Output $y$ WRT Pre-activation function sum
 	4.  Pre-activation function sum WRT weight
-		-   Other weights constant, goes to zero
+		-   Other [[Weight Init|weights]] constant, goes to zero
 		-   Leaves just $y_i$
 	-   Collect 3 boxed terms as delta $j$
 		-   Local gradient
--- a/Networks/MLP/MLP.md
+++ b/Networks/MLP/MLP.md
@ -2,7 +2,7 @@
 -   Single hidden layer can learn any function
 	-   Universal approximation theorem
 -   Each hidden layer can operate as a different feature extraction layer
-   Lots of weights to learn
+-   Lots of [[Weight Init|weights]] to learn
 -   [[Back-Propagation]] is supervised

 ![[mlp-arch.png]]
--- a/Networks/Transformers/Attention.md
+++ b/Networks/Transformers/Attention.md
@ -19,7 +19,7 @@

 # Scaled Dot-Product
 - Calculate attention weights between all tokens at once
- Learn 3 weight matrices
+- Learn 3 [[Weight Init|weight]] matrices
 	- Query
 		- $W_Q$
 	- Key
--- a/CS/ABI.md
+++ b/CS/ABI.md
@ -31,5 +31,5 @@
 # Embedded ABI
 - File format, data types, register usage, stack frame organisation, function parameter passing conventions
 	- For embedded OS
- Compilers create object code compatible with code from other compilers
-	- Link libraries from different compilers
+- [[Compilers]] create object code compatible with code from other [[compilers]]
+	- Link libraries from different [[compilers]]
--- a/Conventions.md
+++ b/Conventions.md
@ -5,15 +5,15 @@
 	- Also known as: callee-saved registers or non-volatile registers
 - How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee

-Subtle differences between compilers, can be difficult to interface codes from different compilers
+Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]

 Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])

 # cdecl
 C declaration

- Originally from Microsoft's C compiler
-	- Used by many C compilers for x86
+- Originally from Microsoft's C [[compilers|compiler]]
+	- Used by many C [[compilers]] for x86
 - Subroutine arguments passed on the stack
 - Function arguments pushed right-to-left
 	- Last pushed first
--- a/Types.md
+++ b/Types.md
@ -1,16 +1,16 @@
 ## Machine Code
 -   Machine language instructions
-   Directly control CPU
+-   Directly control [[Processors|CPU]]
 -   Strictly numerical
 -   Lowest-level representation of a compiled or assembled program
 	-   Lowest-level visible to programmer
 	-   Internally microcode might used
 -   Hardware dependent
 -   Higher-level languages translated to machine code
-	-   Compilers, assemblers and linkers
+	-   [[Compilers]], assemblers and linkers
 	-   Not for interpreted code
 		-   Interpreter runs machine code
-   Assembly is effectively human readable machine code
+-   [[Assembly]] is effectively human readable machine code
 	-   Has mnemonics for opcodes etc

 ## Microcode
--- a/CS/Language
+++ b/CS/Language
@ -24,5 +24,5 @@
 - Adobe Flash Player
 	- Tamarin
 - JVM
- LLVM
+- [[Compilers#LLVM|LLVM]]
 - Silverlight
--- a/img/am-regulariser.png
+++ b/img/am-regulariser.png
--- a/img/skip-connections.png
+++ b/img/skip-connections.png