vault backup: 2023-05-27 22:17:56

Affected files:
.obsidian/graph.json
.obsidian/workspace-mobile.json
.obsidian/workspace.json
STEM/AI/Neural Networks/Activation Functions.md
STEM/AI/Neural Networks/CNN/FCN/FlowNet.md
STEM/AI/Neural Networks/CNN/FCN/ResNet.md
STEM/AI/Neural Networks/CNN/FCN/Skip Connections.md
STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md
STEM/AI/Neural Networks/CNN/GAN/GAN.md
STEM/AI/Neural Networks/CNN/Interpretation.md
STEM/AI/Neural Networks/Deep Learning.md
STEM/AI/Neural Networks/MLP/Back-Propagation.md
STEM/AI/Neural Networks/MLP/MLP.md
STEM/AI/Neural Networks/Transformers/Attention.md
STEM/CS/ABI.md
STEM/CS/Calling Conventions.md
STEM/CS/Code Types.md
STEM/CS/Language Binding.md
STEM/img/am-regulariser.png
STEM/img/skip-connections.png
This commit is contained in:
andy 2023-05-27 22:17:56 +01:00
parent acb7dc429e
commit 33ac3007bc
17 changed files with 76 additions and 22 deletions

View File

@ -38,7 +38,7 @@ y_j(n)(1-y_j(n))$$
- Nice derivative
- Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$
- Min value of 0 when $y_j=0$ or $1$
- Initial weights chosen so not saturated at 0 or 1
- Initial [[Weight Init|weights]] chosen so not saturated at 0 or 1
If $y=\frac u v$
Where $u$ and $v$ are differential functions

View File

@ -3,7 +3,7 @@ Optical Flow
- 2-Channel optical flow
- $dx,dy$
- Two consecutive frames
- 6-channel tensor
- 6-channel [[tensor]]
![[flownet.png]]

View File

@ -0,0 +1,25 @@
- Residual networks
- 152 layers
- Skips every two layers
- Residual block
- Later layers learning the identity function
- Skips help
- Deep network should be at least as good as shallower one by allowing some layers to do very little
- Vanishing gradient
- Allows shortcut paths for gradients
- Accuracy saturation
- Adding more layers to suitably deep network increases training error
# Design
- Skips across pairs of conv layers
- Elementwise addition
- All layer 3x3 kernel
- Spatial size halves each layer
- Filters doubles each layer
- Fully convolutional
- No fc layer
- No pooling
- Except at end
- No dropout

View File

@ -0,0 +1,16 @@
- Output of conv, c, layers are added to inputs of upconv, d, layers
- Element-wise, not channel appending
- Propagate high frequency information to later layers
- Two types
- Additive
- Resnet
- Super-resolution auto-encoder
- Concatenative
- Densely connected architectures
- DenseNet
- FlowNet
![[skip-connections.png]]
[AI Summer - Skip Connections](https://theaisummer.com/skip-connections/)
[Arxiv - Visualising the Loss Landscape](https://arxiv.org/abs/1712.09913)aaaaa

View File

@ -7,7 +7,7 @@ Deep Convolutional [[GAN]]
- Generate image from code
- Low-dimensional
- ~100-D
- Reshape to tensor
- Reshape to [[tensor]]
- [[Upconv]] to image
- Train using Gaussian random noise for code
- Discriminator

View File

@ -27,5 +27,5 @@
# Code Vector Math for Control
![[cvmfc.png]]
- Do AM to derive code for an image
- Do [[Interpretation#Activation Maximisation|AM]] to derive code for an image
![[code-vector-math-for-control-results.png]]

View File

@ -18,3 +18,16 @@
- Minimise
- Total variation
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)$$
- Won't work
$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)+\lambda\mathcal R(x)$$
- Need a regulariser like above
![[am-regulariser.png]]
$$\mathcal R_{V^\beta}(f)=\int_\Omega\left(\left(\frac{\partial f}{\partial u}(u,v)\right)^2+\left(\frac{\partial f}{\partial v}(u,v)\right)^2\right)^{\frac \beta 2}du\space dv$$
$$\mathcal R_{V^\beta}(x)=\sum_{i,j}\left(\left(x_{i,j+1}-x_{ij}\right)^2+\left(x_{i+1,j}-x_{ij}\right)^2\right)^{\frac \beta 2}$$
- Beta
- Degree of smoothing

View File

@ -32,16 +32,16 @@ Predict
Evaluate
# Data Structure
- Tensor flow = channels last
- [[Tensor]] flow = channels last
- (samples, height, width, channels)
- Vector data
- 2D tensors of shape (samples, features)
- 2D [[tensor]]s of shape (samples, features)
- Time series data or sequence data
- 3D tensors of shape (samples, timesteps, features)
- 3D [[tensor]]s of shape (samples, timesteps, features)
- Images
- 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, Width)
- 4D [[tensor]]s of shape (samples, height, width, channels) or (samples, channels, height, Width)
- Video
- 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
- 5D [[tensor]]s of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
![[photo-tensor.png]]
![[matrix-dot-product.png]]

View File

@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
2. Error WRT output $y$
3. Output $y$ WRT Pre-activation function sum
4. Pre-activation function sum WRT weight
- Other weights constant, goes to zero
- Other [[Weight Init|weights]] constant, goes to zero
- Leaves just $y_i$
- Collect 3 boxed terms as delta $j$
- Local gradient

View File

@ -2,7 +2,7 @@
- Single hidden layer can learn any function
- Universal approximation theorem
- Each hidden layer can operate as a different feature extraction layer
- Lots of weights to learn
- Lots of [[Weight Init|weights]] to learn
- [[Back-Propagation]] is supervised
![[mlp-arch.png]]

View File

@ -19,7 +19,7 @@
# Scaled Dot-Product
- Calculate attention weights between all tokens at once
- Learn 3 weight matrices
- Learn 3 [[Weight Init|weight]] matrices
- Query
- $W_Q$
- Key

View File

@ -31,5 +31,5 @@
# Embedded ABI
- File format, data types, register usage, stack frame organisation, function parameter passing conventions
- For embedded OS
- Compilers create object code compatible with code from other compilers
- Link libraries from different compilers
- [[Compilers]] create object code compatible with code from other [[compilers]]
- Link libraries from different [[compilers]]

View File

@ -5,15 +5,15 @@
- Also known as: callee-saved registers or non-volatile registers
- How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
Subtle differences between compilers, can be difficult to interface codes from different compilers
Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])
# cdecl
C declaration
- Originally from Microsoft's C compiler
- Used by many C compilers for x86
- Originally from Microsoft's C [[compilers|compiler]]
- Used by many C [[compilers]] for x86
- Subroutine arguments passed on the stack
- Function arguments pushed right-to-left
- Last pushed first

View File

@ -1,16 +1,16 @@
## Machine Code
- Machine language instructions
- Directly control CPU
- Directly control [[Processors|CPU]]
- Strictly numerical
- Lowest-level representation of a compiled or assembled program
- Lowest-level visible to programmer
- Internally microcode might used
- Hardware dependent
- Higher-level languages translated to machine code
- Compilers, assemblers and linkers
- [[Compilers]], assemblers and linkers
- Not for interpreted code
- Interpreter runs machine code
- Assembly is effectively human readable machine code
- [[Assembly]] is effectively human readable machine code
- Has mnemonics for opcodes etc
## Microcode

View File

@ -24,5 +24,5 @@
- Adobe Flash Player
- Tamarin
- JVM
- LLVM
- [[Compilers#LLVM|LLVM]]
- Silverlight

BIN
img/am-regulariser.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

BIN
img/skip-connections.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB