vault backup: 2023-05-27 22:17:56
Affected files: .obsidian/graph.json .obsidian/workspace-mobile.json .obsidian/workspace.json STEM/AI/Neural Networks/Activation Functions.md STEM/AI/Neural Networks/CNN/FCN/FlowNet.md STEM/AI/Neural Networks/CNN/FCN/ResNet.md STEM/AI/Neural Networks/CNN/FCN/Skip Connections.md STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md STEM/AI/Neural Networks/CNN/GAN/GAN.md STEM/AI/Neural Networks/CNN/Interpretation.md STEM/AI/Neural Networks/Deep Learning.md STEM/AI/Neural Networks/MLP/Back-Propagation.md STEM/AI/Neural Networks/MLP/MLP.md STEM/AI/Neural Networks/Transformers/Attention.md STEM/CS/ABI.md STEM/CS/Calling Conventions.md STEM/CS/Code Types.md STEM/CS/Language Binding.md STEM/img/am-regulariser.png STEM/img/skip-connections.png
This commit is contained in:
parent
acb7dc429e
commit
33ac3007bc
@ -38,7 +38,7 @@ y_j(n)(1-y_j(n))$$
|
||||
- Nice derivative
|
||||
- Max value of $\varphi_j'(v_j(n))$ occurs when $y_j(n)=0.5$
|
||||
- Min value of 0 when $y_j=0$ or $1$
|
||||
- Initial weights chosen so not saturated at 0 or 1
|
||||
- Initial [[Weight Init|weights]] chosen so not saturated at 0 or 1
|
||||
|
||||
If $y=\frac u v$
|
||||
Where $u$ and $v$ are differential functions
|
||||
|
@ -3,7 +3,7 @@ Optical Flow
|
||||
- 2-Channel optical flow
|
||||
- $dx,dy$
|
||||
- Two consecutive frames
|
||||
- 6-channel tensor
|
||||
- 6-channel [[tensor]]
|
||||
|
||||
![[flownet.png]]
|
||||
|
||||
|
25
AI/Neural Networks/CNN/FCN/ResNet.md
Normal file
25
AI/Neural Networks/CNN/FCN/ResNet.md
Normal file
@ -0,0 +1,25 @@
|
||||
- Residual networks
|
||||
- 152 layers
|
||||
- Skips every two layers
|
||||
- Residual block
|
||||
- Later layers learning the identity function
|
||||
- Skips help
|
||||
- Deep network should be at least as good as shallower one by allowing some layers to do very little
|
||||
- Vanishing gradient
|
||||
- Allows shortcut paths for gradients
|
||||
- Accuracy saturation
|
||||
- Adding more layers to suitably deep network increases training error
|
||||
|
||||
# Design
|
||||
|
||||
- Skips across pairs of conv layers
|
||||
- Elementwise addition
|
||||
- All layer 3x3 kernel
|
||||
- Spatial size halves each layer
|
||||
- Filters doubles each layer
|
||||
- Fully convolutional
|
||||
- No fc layer
|
||||
- No pooling
|
||||
- Except at end
|
||||
- No dropout
|
||||
|
16
AI/Neural Networks/CNN/FCN/Skip Connections.md
Normal file
16
AI/Neural Networks/CNN/FCN/Skip Connections.md
Normal file
@ -0,0 +1,16 @@
|
||||
- Output of conv, c, layers are added to inputs of upconv, d, layers
|
||||
- Element-wise, not channel appending
|
||||
- Propagate high frequency information to later layers
|
||||
- Two types
|
||||
- Additive
|
||||
- Resnet
|
||||
- Super-resolution auto-encoder
|
||||
- Concatenative
|
||||
- Densely connected architectures
|
||||
- DenseNet
|
||||
- FlowNet
|
||||
|
||||
![[skip-connections.png]]
|
||||
|
||||
[AI Summer - Skip Connections](https://theaisummer.com/skip-connections/)
|
||||
[Arxiv - Visualising the Loss Landscape](https://arxiv.org/abs/1712.09913)aaaaa
|
@ -7,7 +7,7 @@ Deep Convolutional [[GAN]]
|
||||
- Generate image from code
|
||||
- Low-dimensional
|
||||
- ~100-D
|
||||
- Reshape to tensor
|
||||
- Reshape to [[tensor]]
|
||||
- [[Upconv]] to image
|
||||
- Train using Gaussian random noise for code
|
||||
- Discriminator
|
||||
|
@ -27,5 +27,5 @@
|
||||
|
||||
# Code Vector Math for Control
|
||||
![[cvmfc.png]]
|
||||
- Do AM to derive code for an image
|
||||
- Do [[Interpretation#Activation Maximisation|AM]] to derive code for an image
|
||||
![[code-vector-math-for-control-results.png]]
|
@ -17,4 +17,17 @@
|
||||
- Prone to high frequency noise
|
||||
- Minimise
|
||||
- Total variation
|
||||
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
|
||||
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
|
||||
|
||||
$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)$$
|
||||
- Won't work
|
||||
$$x^*=\text{argmin}_{x\in \mathbb R^{H\times W\times C}}\mathcal l(\phi(x),\phi_0)+\lambda\mathcal R(x)$$
|
||||
- Need a regulariser like above
|
||||
|
||||
![[am-regulariser.png]]
|
||||
|
||||
$$\mathcal R_{V^\beta}(f)=\int_\Omega\left(\left(\frac{\partial f}{\partial u}(u,v)\right)^2+\left(\frac{\partial f}{\partial v}(u,v)\right)^2\right)^{\frac \beta 2}du\space dv$$
|
||||
|
||||
$$\mathcal R_{V^\beta}(x)=\sum_{i,j}\left(\left(x_{i,j+1}-x_{ij}\right)^2+\left(x_{i+1,j}-x_{ij}\right)^2\right)^{\frac \beta 2}$$
|
||||
- Beta
|
||||
- Degree of smoothing
|
@ -32,16 +32,16 @@ Predict
|
||||
Evaluate
|
||||
|
||||
# Data Structure
|
||||
- Tensor flow = channels last
|
||||
- [[Tensor]] flow = channels last
|
||||
- (samples, height, width, channels)
|
||||
- Vector data
|
||||
- 2D tensors of shape (samples, features)
|
||||
- 2D [[tensor]]s of shape (samples, features)
|
||||
- Time series data or sequence data
|
||||
- 3D tensors of shape (samples, timesteps, features)
|
||||
- 3D [[tensor]]s of shape (samples, timesteps, features)
|
||||
- Images
|
||||
- 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, Width)
|
||||
- 4D [[tensor]]s of shape (samples, height, width, channels) or (samples, channels, height, Width)
|
||||
- Video
|
||||
- 5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
|
||||
- 5D [[tensor]]s of shape (samples, frames, height, width, channels) or (samples, frames, channels , height, width)
|
||||
|
||||
![[photo-tensor.png]]
|
||||
![[matrix-dot-product.png]]
|
@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
|
||||
2. Error WRT output $y$
|
||||
3. Output $y$ WRT Pre-activation function sum
|
||||
4. Pre-activation function sum WRT weight
|
||||
- Other weights constant, goes to zero
|
||||
- Other [[Weight Init|weights]] constant, goes to zero
|
||||
- Leaves just $y_i$
|
||||
- Collect 3 boxed terms as delta $j$
|
||||
- Local gradient
|
||||
|
@ -2,7 +2,7 @@
|
||||
- Single hidden layer can learn any function
|
||||
- Universal approximation theorem
|
||||
- Each hidden layer can operate as a different feature extraction layer
|
||||
- Lots of weights to learn
|
||||
- Lots of [[Weight Init|weights]] to learn
|
||||
- [[Back-Propagation]] is supervised
|
||||
|
||||
![[mlp-arch.png]]
|
||||
|
@ -19,7 +19,7 @@
|
||||
|
||||
# Scaled Dot-Product
|
||||
- Calculate attention weights between all tokens at once
|
||||
- Learn 3 weight matrices
|
||||
- Learn 3 [[Weight Init|weight]] matrices
|
||||
- Query
|
||||
- $W_Q$
|
||||
- Key
|
||||
|
@ -31,5 +31,5 @@
|
||||
# Embedded ABI
|
||||
- File format, data types, register usage, stack frame organisation, function parameter passing conventions
|
||||
- For embedded OS
|
||||
- Compilers create object code compatible with code from other compilers
|
||||
- Link libraries from different compilers
|
||||
- [[Compilers]] create object code compatible with code from other [[compilers]]
|
||||
- Link libraries from different [[compilers]]
|
@ -5,15 +5,15 @@
|
||||
- Also known as: callee-saved registers or non-volatile registers
|
||||
- How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
|
||||
|
||||
Subtle differences between compilers, can be difficult to interface codes from different compilers
|
||||
Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]
|
||||
|
||||
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])
|
||||
|
||||
# cdecl
|
||||
C declaration
|
||||
|
||||
- Originally from Microsoft's C compiler
|
||||
- Used by many C compilers for x86
|
||||
- Originally from Microsoft's C [[compilers|compiler]]
|
||||
- Used by many C [[compilers]] for x86
|
||||
- Subroutine arguments passed on the stack
|
||||
- Function arguments pushed right-to-left
|
||||
- Last pushed first
|
||||
|
@ -1,16 +1,16 @@
|
||||
## Machine Code
|
||||
- Machine language instructions
|
||||
- Directly control CPU
|
||||
- Directly control [[Processors|CPU]]
|
||||
- Strictly numerical
|
||||
- Lowest-level representation of a compiled or assembled program
|
||||
- Lowest-level visible to programmer
|
||||
- Internally microcode might used
|
||||
- Hardware dependent
|
||||
- Higher-level languages translated to machine code
|
||||
- Compilers, assemblers and linkers
|
||||
- [[Compilers]], assemblers and linkers
|
||||
- Not for interpreted code
|
||||
- Interpreter runs machine code
|
||||
- Assembly is effectively human readable machine code
|
||||
- [[Assembly]] is effectively human readable machine code
|
||||
- Has mnemonics for opcodes etc
|
||||
|
||||
## Microcode
|
||||
|
@ -24,5 +24,5 @@
|
||||
- Adobe Flash Player
|
||||
- Tamarin
|
||||
- JVM
|
||||
- LLVM
|
||||
- [[Compilers#LLVM|LLVM]]
|
||||
- Silverlight
|
BIN
img/am-regulariser.png
Normal file
BIN
img/am-regulariser.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 352 KiB |
BIN
img/skip-connections.png
Normal file
BIN
img/skip-connections.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 51 KiB |
Loading…
Reference in New Issue
Block a user