diff --git a/AI/Neural Networks/Activation Functions.md b/AI/Neural Networks/Activation Functions.md index e446756..a8e5860 100644 --- a/AI/Neural Networks/Activation Functions.md +++ b/AI/Neural Networks/Activation Functions.md @@ -38,7 +38,7 @@ y_j(n)(1-y_j(n))$$ - Nice derivative - Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$ - Min value of 0 when $y_j=0$ or $1$ -- Initial weights chosen so not saturated at 0 or 1 +- Initial [[Weight Init|weights]] chosen so not saturated at 0 or 1 If $y=\frac u v$ Where $u$ and $v$ are differential functions diff --git a/AI/Neural Networks/CNN/FCN/FlowNet.md b/AI/Neural Networks/CNN/FCN/FlowNet.md index b9ef983..6516946 100644 --- a/AI/Neural Networks/CNN/FCN/FlowNet.md +++ b/AI/Neural Networks/CNN/FCN/FlowNet.md @@ -3,7 +3,7 @@ Optical Flow - 2-Channel optical flow - $dx,dy$ - Two consecutive frames - - 6-channel tensor + - 6-channel [[tensor]] ![[flownet.png]] diff --git a/AI/Neural Networks/CNN/FCN/ResNet.md b/AI/Neural Networks/CNN/FCN/ResNet.md new file mode 100644 index 0000000..886f426 --- /dev/null +++ b/AI/Neural Networks/CNN/FCN/ResNet.md @@ -0,0 +1,25 @@ +- Residual networks +- 152 layers +- Skips every two layers + - Residual block +- Later layers learning the identity function + - Skips help + - Deep network should be at least as good as shallower one by allowing some layers to do very little +- Vanishing gradient + - Allows shortcut paths for gradients +- Accuracy saturation + - Adding more layers to suitably deep network increases training error + +# Design + +- Skips across pairs of conv layers + - Elementwise addition +- All layer 3x3 kernel +- Spatial size halves each layer +- Filters doubles each layer +- Fully convolutional + - No fc layer + - No pooling + - Except at end + - No dropout + diff --git a/AI/Neural Networks/CNN/FCN/Skip Connections.md b/AI/Neural Networks/CNN/FCN/Skip Connections.md new file mode 100644 index 0000000..9a01257 --- /dev/null +++ b/AI/Neural Networks/CNN/FCN/Skip Connections.md @@ -0,0 +1,16 @@ +- Output of conv, c, layers are added to inputs of upconv, d, layers +- Element-wise, not channel appending +- Propagate high frequency information to later layers +- Two types + - Additive + - Resnet + - Super-resolution auto-encoder +- Concatenative + - Densely connected architectures + - DenseNet + - FlowNet + +![[skip-connections.png]] + +[AI Summer - Skip Connections](https://theaisummer.com/skip-connections/) +[Arxiv - Visualising the Loss Landscape](https://arxiv.org/abs/1712.09913)aaaaa \ No newline at end of file diff --git a/AI/Neural Networks/CNN/GAN/DC-GAN.md b/AI/Neural Networks/CNN/GAN/DC-GAN.md index e096b8d..d0a1cc2 100644 --- a/AI/Neural Networks/CNN/GAN/DC-GAN.md +++ b/AI/Neural Networks/CNN/GAN/DC-GAN.md @@ -7,7 +7,7 @@ Deep Convolutional [[GAN]] - Generate image from code - Low-dimensional - ~100-D - - Reshape to tensor + - Reshape to [[tensor]] - [[Upconv]] to image - Train using Gaussian random noise for code - Discriminator diff --git a/AI/Neural Networks/CNN/GAN/GAN.md b/AI/Neural Networks/CNN/GAN/GAN.md index 93c3553..5ea2ea5 100644 --- a/AI/Neural Networks/CNN/GAN/GAN.md +++ b/AI/Neural Networks/CNN/GAN/GAN.md @@ -27,5 +27,5 @@ # Code Vector Math for Control ![[cvmfc.png]] -- Do AM to derive code for an image +- Do [[Interpretation#Activation Maximisation|AM]] to derive code for an image ![[code-vector-math-for-control-results.png]] \ No newline at end of file diff --git a/AI/Neural Networks/CNN/Interpretation.md b/AI/Neural Networks/CNN/Interpretation.md index 081ce35..b35fe90 100644 --- a/AI/Neural Networks/CNN/Interpretation.md +++ b/AI/Neural Networks/CNN/Interpretation.md @@ -17,4 +17,17 @@ - Prone to high frequency noise - Minimise - Total variation - - $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]] \ No newline at end of file + - $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]] + +$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)$$ +- Won't work +$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)+\lambda\mathcal R(x)$$ +- Need a regulariser like above + +![[am-regulariser.png]] + +$$\mathcal R_{V^\beta}(f)=\int_\Omega\left(\left(\frac{\partial f}{\partial u}(u,v)\right)^2+\left(\frac{\partial f}{\partial v}(u,v)\right)^2\right)^{\frac \beta 2}du\space dv$$ + +$$\mathcal R_{V^\beta}(x)=\sum_{i,j}\left(\left(x_{i,j+1}-x_{ij}\right)^2+\left(x_{i+1,j}-x_{ij}\right)^2\right)^{\frac \beta 2}$$ +- Beta + - Degree of smoothing \ No newline at end of file diff --git a/AI/Neural Networks/Deep Learning.md b/AI/Neural Networks/Deep Learning.md index d3bdbd9..9857edd 100644 --- a/AI/Neural Networks/Deep Learning.md +++ b/AI/Neural Networks/Deep Learning.md @@ -32,16 +32,16 @@ Predict Evaluate # Data Structure -- Tensor flow = channels last +- [[Tensor]] flow = channels last - (samples, height, width, channels) - Vector data - - 2D tensors of shape (samples, features) + - 2D [[tensor]]s of shape (samples, features) - Time series data or sequence data - - 3D tensors of shape (samples, timesteps, features) + - 3D [[tensor]]s of shape (samples, timesteps, features) - Images - - 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, Width) + - 4D [[tensor]]s of shape (samples, height, width, channels) or (samples, channels, height, Width) - Video - - 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width) + - 5D [[tensor]]s of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width) ![[photo-tensor.png]] ![[matrix-dot-product.png]] \ No newline at end of file diff --git a/AI/Neural Networks/MLP/Back-Propagation.md b/AI/Neural Networks/MLP/Back-Propagation.md index 20181bc..5d3b833 100644 --- a/AI/Neural Networks/MLP/Back-Propagation.md +++ b/AI/Neural Networks/MLP/Back-Propagation.md @@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$ 2. Error WRT output $y$ 3. Output $y$ WRT Pre-activation function sum 4. Pre-activation function sum WRT weight - - Other weights constant, goes to zero + - Other [[Weight Init|weights]] constant, goes to zero - Leaves just $y_i$ - Collect 3 boxed terms as delta $j$ - Local gradient diff --git a/AI/Neural Networks/MLP/MLP.md b/AI/Neural Networks/MLP/MLP.md index 21aedc6..eb51b04 100644 --- a/AI/Neural Networks/MLP/MLP.md +++ b/AI/Neural Networks/MLP/MLP.md @@ -2,7 +2,7 @@ - Single hidden layer can learn any function - Universal approximation theorem - Each hidden layer can operate as a different feature extraction layer -- Lots of weights to learn +- Lots of [[Weight Init|weights]] to learn - [[Back-Propagation]] is supervised ![[mlp-arch.png]] diff --git a/AI/Neural Networks/Transformers/Attention.md b/AI/Neural Networks/Transformers/Attention.md index ba7c9ca..6cde5c4 100644 --- a/AI/Neural Networks/Transformers/Attention.md +++ b/AI/Neural Networks/Transformers/Attention.md @@ -19,7 +19,7 @@ # Scaled Dot-Product - Calculate attention weights between all tokens at once -- Learn 3 weight matrices +- Learn 3 [[Weight Init|weight]] matrices - Query - $W_Q$ - Key diff --git a/CS/ABI.md b/CS/ABI.md index a3656d2..074d1a1 100644 --- a/CS/ABI.md +++ b/CS/ABI.md @@ -31,5 +31,5 @@ # Embedded ABI - File format, data types, register usage, stack frame organisation, function parameter passing conventions - For embedded OS -- Compilers create object code compatible with code from other compilers - - Link libraries from different compilers \ No newline at end of file +- [[Compilers]] create object code compatible with code from other [[compilers]] + - Link libraries from different [[compilers]] \ No newline at end of file diff --git a/CS/Calling Conventions.md b/CS/Calling Conventions.md index be3fd10..cf1cad8 100644 --- a/CS/Calling Conventions.md +++ b/CS/Calling Conventions.md @@ -5,15 +5,15 @@ - Also known as: callee-saved registers or non-volatile registers - How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee -Subtle differences between compilers, can be difficult to interface codes from different compilers +Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]] Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]]) # cdecl C declaration -- Originally from Microsoft's C compiler - - Used by many C compilers for x86 +- Originally from Microsoft's C [[compilers|compiler]] + - Used by many C [[compilers]] for x86 - Subroutine arguments passed on the stack - Function arguments pushed right-to-left - Last pushed first diff --git a/CS/Code Types.md b/CS/Code Types.md index 962727e..848f500 100644 --- a/CS/Code Types.md +++ b/CS/Code Types.md @@ -1,16 +1,16 @@ ## Machine Code - Machine language instructions -- Directly control CPU +- Directly control [[Processors|CPU]] - Strictly numerical - Lowest-level representation of a compiled or assembled program - Lowest-level visible to programmer - Internally microcode might used - Hardware dependent - Higher-level languages translated to machine code - - Compilers, assemblers and linkers + - [[Compilers]], assemblers and linkers - Not for interpreted code - Interpreter runs machine code -- Assembly is effectively human readable machine code +- [[Assembly]] is effectively human readable machine code - Has mnemonics for opcodes etc ## Microcode diff --git a/CS/Language Binding.md b/CS/Language Binding.md index 3bda512..38c5369 100644 --- a/CS/Language Binding.md +++ b/CS/Language Binding.md @@ -24,5 +24,5 @@ - Adobe Flash Player - Tamarin - JVM -- LLVM +- [[Compilers#LLVM|LLVM]] - Silverlight \ No newline at end of file diff --git a/img/am-regulariser.png b/img/am-regulariser.png new file mode 100644 index 0000000..cab63b4 Binary files /dev/null and b/img/am-regulariser.png differ diff --git a/img/skip-connections.png b/img/skip-connections.png new file mode 100644 index 0000000..6b68075 Binary files /dev/null and b/img/skip-connections.png differ