vault backup: 2023-05-27 22:17:56

Affected files:
.obsidian/graph.json
.obsidian/workspace-mobile.json
.obsidian/workspace.json
STEM/AI/Neural Networks/Activation Functions.md
STEM/AI/Neural Networks/CNN/FCN/FlowNet.md
STEM/AI/Neural Networks/CNN/FCN/ResNet.md
STEM/AI/Neural Networks/CNN/FCN/Skip Connections.md
STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md
STEM/AI/Neural Networks/CNN/GAN/GAN.md
STEM/AI/Neural Networks/CNN/Interpretation.md
STEM/AI/Neural Networks/Deep Learning.md
STEM/AI/Neural Networks/MLP/Back-Propagation.md
STEM/AI/Neural Networks/MLP/MLP.md
STEM/AI/Neural Networks/Transformers/Attention.md
STEM/CS/ABI.md
STEM/CS/Calling Conventions.md
STEM/CS/Code Types.md
STEM/CS/Language Binding.md
STEM/img/am-regulariser.png
STEM/img/skip-connections.png
This commit is contained in:
andy 2023-05-27 22:17:56 +01:00
parent acb7dc429e
commit 33ac3007bc
17 changed files with 76 additions and 22 deletions

View File

@ -38,7 +38,7 @@ y_j(n)(1-y_j(n))$$
- Nice derivative - Nice derivative
- Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$ - Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$
- Min value of 0 when $y_j=0$ or $1$ - Min value of 0 when $y_j=0$ or $1$
- Initial weights chosen so not saturated at 0 or 1 - Initial [[Weight Init|weights]] chosen so not saturated at 0 or 1
If $y=\frac u v$ If $y=\frac u v$
Where $u$ and $v$ are differential functions Where $u$ and $v$ are differential functions

View File

@ -3,7 +3,7 @@ Optical Flow
- 2-Channel optical flow - 2-Channel optical flow
- $dx,dy$ - $dx,dy$
- Two consecutive frames - Two consecutive frames
- 6-channel tensor - 6-channel [[tensor]]
![[flownet.png]] ![[flownet.png]]

View File

@ -0,0 +1,25 @@
- Residual networks
- 152 layers
- Skips every two layers
- Residual block
- Later layers learning the identity function
- Skips help
- Deep network should be at least as good as shallower one by allowing some layers to do very little
- Vanishing gradient
- Allows shortcut paths for gradients
- Accuracy saturation
- Adding more layers to suitably deep network increases training error
# Design
- Skips across pairs of conv layers
- Elementwise addition
- All layer 3x3 kernel
- Spatial size halves each layer
- Filters doubles each layer
- Fully convolutional
- No fc layer
- No pooling
- Except at end
- No dropout

View File

@ -0,0 +1,16 @@
- Output of conv, c, layers are added to inputs of upconv, d, layers
- Element-wise, not channel appending
- Propagate high frequency information to later layers
- Two types
- Additive
- Resnet
- Super-resolution auto-encoder
- Concatenative
- Densely connected architectures
- DenseNet
- FlowNet
![[skip-connections.png]]
[AI Summer - Skip Connections](https://theaisummer.com/skip-connections/)
[Arxiv - Visualising the Loss Landscape](https://arxiv.org/abs/1712.09913)aaaaa

View File

@ -7,7 +7,7 @@ Deep Convolutional [[GAN]]
- Generate image from code - Generate image from code
- Low-dimensional - Low-dimensional
- ~100-D - ~100-D
- Reshape to tensor - Reshape to [[tensor]]
- [[Upconv]] to image - [[Upconv]] to image
- Train using Gaussian random noise for code - Train using Gaussian random noise for code
- Discriminator - Discriminator

View File

@ -27,5 +27,5 @@
# Code Vector Math for Control # Code Vector Math for Control
![[cvmfc.png]] ![[cvmfc.png]]
- Do AM to derive code for an image - Do [[Interpretation#Activation Maximisation|AM]] to derive code for an image
![[code-vector-math-for-control-results.png]] ![[code-vector-math-for-control-results.png]]

View File

@ -18,3 +18,16 @@
- Minimise - Minimise
- Total variation - Total variation
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]] - $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)$$
- Won't work
$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)+\lambda\mathcal R(x)$$
- Need a regulariser like above
![[am-regulariser.png]]
$$\mathcal R_{V^\beta}(f)=\int_\Omega\left(\left(\frac{\partial f}{\partial u}(u,v)\right)^2+\left(\frac{\partial f}{\partial v}(u,v)\right)^2\right)^{\frac \beta 2}du\space dv$$
$$\mathcal R_{V^\beta}(x)=\sum_{i,j}\left(\left(x_{i,j+1}-x_{ij}\right)^2+\left(x_{i+1,j}-x_{ij}\right)^2\right)^{\frac \beta 2}$$
- Beta
- Degree of smoothing

View File

@ -32,16 +32,16 @@ Predict
Evaluate Evaluate
# Data Structure # Data Structure
- Tensor flow = channels last - [[Tensor]] flow = channels last
- (samples, height, width, channels) - (samples, height, width, channels)
- Vector data - Vector data
- 2D tensors of shape (samples, features) - 2D [[tensor]]s of shape (samples, features)
- Time series data or sequence data - Time series data or sequence data
- 3D tensors of shape (samples, timesteps, features) - 3D [[tensor]]s of shape (samples, timesteps, features)
- Images - Images
- 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, Width) - 4D [[tensor]]s of shape (samples, height, width, channels) or (samples, channels, height, Width)
- Video - Video
- 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width) - 5D [[tensor]]s of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
![[photo-tensor.png]] ![[photo-tensor.png]]
![[matrix-dot-product.png]] ![[matrix-dot-product.png]]

View File

@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
2. Error WRT output $y$ 2. Error WRT output $y$
3. Output $y$ WRT Pre-activation function sum 3. Output $y$ WRT Pre-activation function sum
4. Pre-activation function sum WRT weight 4. Pre-activation function sum WRT weight
- Other weights constant, goes to zero - Other [[Weight Init|weights]] constant, goes to zero
- Leaves just $y_i$ - Leaves just $y_i$
- Collect 3 boxed terms as delta $j$ - Collect 3 boxed terms as delta $j$
- Local gradient - Local gradient

View File

@ -2,7 +2,7 @@
- Single hidden layer can learn any function - Single hidden layer can learn any function
- Universal approximation theorem - Universal approximation theorem
- Each hidden layer can operate as a different feature extraction layer - Each hidden layer can operate as a different feature extraction layer
- Lots of weights to learn - Lots of [[Weight Init|weights]] to learn
- [[Back-Propagation]] is supervised - [[Back-Propagation]] is supervised
![[mlp-arch.png]] ![[mlp-arch.png]]

View File

@ -19,7 +19,7 @@
# Scaled Dot-Product # Scaled Dot-Product
- Calculate attention weights between all tokens at once - Calculate attention weights between all tokens at once
- Learn 3 weight matrices - Learn 3 [[Weight Init|weight]] matrices
- Query - Query
- $W_Q$ - $W_Q$
- Key - Key

View File

@ -31,5 +31,5 @@
# Embedded ABI # Embedded ABI
- File format, data types, register usage, stack frame organisation, function parameter passing conventions - File format, data types, register usage, stack frame organisation, function parameter passing conventions
- For embedded OS - For embedded OS
- Compilers create object code compatible with code from other compilers - [[Compilers]] create object code compatible with code from other [[compilers]]
- Link libraries from different compilers - Link libraries from different [[compilers]]

View File

@ -5,15 +5,15 @@
- Also known as: callee-saved registers or non-volatile registers - Also known as: callee-saved registers or non-volatile registers
- How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee - How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
Subtle differences between compilers, can be difficult to interface codes from different compilers Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]]) Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])
# cdecl # cdecl
C declaration C declaration
- Originally from Microsoft's C compiler - Originally from Microsoft's C [[compilers|compiler]]
- Used by many C compilers for x86 - Used by many C [[compilers]] for x86
- Subroutine arguments passed on the stack - Subroutine arguments passed on the stack
- Function arguments pushed right-to-left - Function arguments pushed right-to-left
- Last pushed first - Last pushed first

View File

@ -1,16 +1,16 @@
## Machine Code ## Machine Code
- Machine language instructions - Machine language instructions
- Directly control CPU - Directly control [[Processors|CPU]]
- Strictly numerical - Strictly numerical
- Lowest-level representation of a compiled or assembled program - Lowest-level representation of a compiled or assembled program
- Lowest-level visible to programmer - Lowest-level visible to programmer
- Internally microcode might used - Internally microcode might used
- Hardware dependent - Hardware dependent
- Higher-level languages translated to machine code - Higher-level languages translated to machine code
- Compilers, assemblers and linkers - [[Compilers]], assemblers and linkers
- Not for interpreted code - Not for interpreted code
- Interpreter runs machine code - Interpreter runs machine code
- Assembly is effectively human readable machine code - [[Assembly]] is effectively human readable machine code
- Has mnemonics for opcodes etc - Has mnemonics for opcodes etc
## Microcode ## Microcode

View File

@ -24,5 +24,5 @@
- Adobe Flash Player - Adobe Flash Player
- Tamarin - Tamarin
- JVM - JVM
- LLVM - [[Compilers#LLVM|LLVM]]
- Silverlight - Silverlight

BIN
img/am-regulariser.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

BIN
img/skip-connections.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB