vault backup: 2023-05-27 00:50:46
Affected files: .obsidian/graph.json .obsidian/workspace-mobile.json .obsidian/workspace.json STEM/AI/Neural Networks/Architectures.md STEM/AI/Neural Networks/CNN/CNN.md STEM/AI/Neural Networks/CNN/Examples.md STEM/AI/Neural Networks/CNN/FCN/FCN.md STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md STEM/AI/Neural Networks/CNN/GAN/GAN.md STEM/AI/Neural Networks/CNN/Interpretation.md STEM/AI/Neural Networks/Deep Learning.md STEM/AI/Neural Networks/MLP/MLP.md STEM/AI/Neural Networks/SLP/Least Mean Square.md STEM/AI/Neural Networks/Transformers/Attention.md STEM/AI/Neural Networks/Transformers/Transformers.md STEM/img/feedforward.png STEM/img/multilayerfeedforward.png STEM/img/recurrent.png STEM/img/recurrentwithhn.png
This commit is contained in:
parent
7052c8c915
commit
acb7dc429e
23
AI/Neural Networks/Architectures.md
Normal file
23
AI/Neural Networks/Architectures.md
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
# Single-Layer Feedforward
|
||||||
|
- *Acyclic*
|
||||||
|
- Count output layer, no computation at input
|
||||||
|
|
||||||
|
![[feedforward.png]]
|
||||||
|
|
||||||
|
# Multilayer Feedforward
|
||||||
|
- Hidden layers
|
||||||
|
- Extract higher-order statistics
|
||||||
|
- Global perspective
|
||||||
|
- Helpful with large input layer
|
||||||
|
- Fully connected
|
||||||
|
- Every neuron is connected to every neuron adjacent layers
|
||||||
|
- Below is a 10-4-2 network
|
||||||
|
![[multilayerfeedforward.png]]
|
||||||
|
|
||||||
|
# Recurrent
|
||||||
|
- At least one feedback loop
|
||||||
|
- Below has no self-feedback
|
||||||
|
![[recurrent.png]]
|
||||||
|
![[recurrentwithhn.png]]
|
||||||
|
|
||||||
|
- Above has hidden neurons
|
@ -14,14 +14,14 @@
|
|||||||
- Double digit % gain on ImageNet accuracy
|
- Double digit % gain on ImageNet accuracy
|
||||||
|
|
||||||
# Full Connected
|
# Full Connected
|
||||||
Dense
|
[[MLP|Dense]]
|
||||||
- Move from convolutional operations towards vector output
|
- Move from convolutional operations towards vector output
|
||||||
- Stochastic drop-out
|
- Stochastic drop-out
|
||||||
- Sub-sample channels and only connect some to dense layers
|
- Sub-sample channels and only connect some to [[MLP|dense]] layers
|
||||||
|
|
||||||
# As a Descriptor
|
# As a Descriptor
|
||||||
- Most powerful as a deeply learned feature extractor
|
- Most powerful as a deeply learned feature extractor
|
||||||
- Dense classifier at the end isn't fantastic
|
- [[MLP|Dense]] classifier at the end isn't fantastic
|
||||||
- Use SVM to classify prior to penultimate layer
|
- Use SVM to classify prior to penultimate layer
|
||||||
|
|
||||||
![[cnn-descriptor.png]]
|
![[cnn-descriptor.png]]
|
||||||
@ -42,13 +42,13 @@ Dense
|
|||||||
|
|
||||||
![[fine-tuning-freezing.png]]
|
![[fine-tuning-freezing.png]]
|
||||||
# Training
|
# Training
|
||||||
- Validation & training loss
|
- Validation & training [[Deep Learning#Loss Function|loss]]
|
||||||
- Early
|
- Early
|
||||||
- Under-fitting
|
- Under-fitting
|
||||||
- Training not representative
|
- Training not representative
|
||||||
- Later
|
- Later
|
||||||
- Overfitting
|
- Overfitting
|
||||||
- V.loss can help adjust learning rate
|
- V.[[Deep Learning#Loss Function|loss]] can help adjust learning rate
|
||||||
- Or indicate when to stop training
|
- Or indicate when to stop training
|
||||||
|
|
||||||
![[under-over-fitting.png]]
|
![[under-over-fitting.png]]
|
@ -29,13 +29,13 @@
|
|||||||
2015
|
2015
|
||||||
|
|
||||||
- [[Inception Layer]]s
|
- [[Inception Layer]]s
|
||||||
- Multiple Loss Functions
|
- Multiple [[Deep Learning#Loss Function|Loss]] Functions
|
||||||
|
|
||||||
![[googlenet.png]]
|
![[googlenet.png]]
|
||||||
|
|
||||||
## [[Inception Layer]]
|
## [[Inception Layer]]
|
||||||
![[googlenet-inception.png]]
|
![[googlenet-inception.png]]
|
||||||
## Auxiliary Loss Functions
|
## Auxiliary [[Deep Learning#Loss Function|Loss]] Functions
|
||||||
- Two other SoftMax blocks
|
- Two other SoftMax blocks
|
||||||
- Help train really deep network
|
- Help train really deep network
|
||||||
- Vanishing gradient problem
|
- Vanishing gradient problem
|
||||||
|
@ -20,13 +20,13 @@ Contractive → [[UpConv]]
|
|||||||
- Rarely from scratch
|
- Rarely from scratch
|
||||||
- Pre-trained weights
|
- Pre-trained weights
|
||||||
- Replace final layers
|
- Replace final layers
|
||||||
- FC layers
|
- [[MLP|FC]] layers
|
||||||
- White-noise initialised
|
- White-noise initialised
|
||||||
- Add [[upconv]] layer(s)
|
- Add [[upconv]] layer(s)
|
||||||
- Fine-tune train
|
- Fine-tune train
|
||||||
- Freeze others
|
- Freeze others
|
||||||
- Annotated GT images
|
- Annotated GT images
|
||||||
- Can use summed per-pixel log loss
|
- Can use summed per-pixel log [[Deep Learning#Loss Function|loss]]
|
||||||
|
|
||||||
# Evaluation
|
# Evaluation
|
||||||
![[fcn-eval.png]]
|
![[fcn-eval.png]]
|
||||||
|
@ -12,11 +12,11 @@ Deep Convolutional [[GAN]]
|
|||||||
- Train using Gaussian random noise for code
|
- Train using Gaussian random noise for code
|
||||||
- Discriminator
|
- Discriminator
|
||||||
- Contractive
|
- Contractive
|
||||||
- Cross-entropy loss
|
- Cross-entropy [[Deep Learning#Loss Function|loss]]
|
||||||
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
|
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
|
||||||
- Normalised output via sigmoid
|
- Normalised output via [[Activation Functions#Sigmoid|sigmoid]]
|
||||||
|
|
||||||
## Loss
|
## [[Deep Learning#Loss Function|Loss]]
|
||||||
$$D(S,L)=-\sum_iL_ilog(S_i)$$
|
$$D(S,L)=-\sum_iL_ilog(S_i)$$
|
||||||
- $S$
|
- $S$
|
||||||
- $(0.1, 0.9)^T$
|
- $(0.1, 0.9)^T$
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
# Fully Convolutional
|
# Fully Convolutional
|
||||||
- Remove [[Max Pooling]]
|
- Remove [[Max Pooling]]
|
||||||
- Use strided [[upconv]]
|
- Use strided [[upconv]]
|
||||||
- Remove FC layers
|
- Remove [[MLP|FC]] layers
|
||||||
- Hurts convergence in non-classification
|
- Hurts convergence in non-classification
|
||||||
- Normalisation tricks
|
- Normalisation tricks
|
||||||
- Batch normalisation
|
- Batch normalisation
|
||||||
|
@ -6,8 +6,8 @@
|
|||||||
![[am.png]]
|
![[am.png]]
|
||||||
- **Use trained network**
|
- **Use trained network**
|
||||||
- Don't update weights
|
- Don't update weights
|
||||||
- Feedforward noise
|
- [[Architectures|Feedforward]] noise
|
||||||
- [[Back-Propagation|Back-propagate]] loss
|
- [[Back-Propagation|Back-propagate]] [[Deep Learning#Loss Function|loss]]
|
||||||
- Don't update weights
|
- Don't update weights
|
||||||
- Update image
|
- Update image
|
||||||
|
|
||||||
@ -17,4 +17,4 @@
|
|||||||
- Prone to high frequency noise
|
- Prone to high frequency noise
|
||||||
- Minimise
|
- Minimise
|
||||||
- Total variation
|
- Total variation
|
||||||
- $x^*$ is the best solution to minimise loss
|
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
|
@ -8,7 +8,7 @@ Objective Function
|
|||||||
![[deep-loss-function.png]]
|
![[deep-loss-function.png]]
|
||||||
|
|
||||||
- Test accuracy worse than train accuracy = overfitting
|
- Test accuracy worse than train accuracy = overfitting
|
||||||
- Dense = fully connected
|
- [[MLP|Dense]] = [[MLP|fully connected]]
|
||||||
- Automates feature engineering
|
- Automates feature engineering
|
||||||
|
|
||||||
![[ml-dl.png]]
|
![[ml-dl.png]]
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
- Feed-forward
|
- [[Architectures|Feedforward]]
|
||||||
- Single hidden layer can learn any function
|
- Single hidden layer can learn any function
|
||||||
- Universal approximation theorem
|
- Universal approximation theorem
|
||||||
- Each hidden layer can operate as a different feature extraction layer
|
- Each hidden layer can operate as a different feature extraction layer
|
||||||
@ -8,7 +8,7 @@
|
|||||||
![[mlp-arch.png]]
|
![[mlp-arch.png]]
|
||||||
|
|
||||||
# Universal Approximation Theory
|
# Universal Approximation Theory
|
||||||
A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function
|
A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function
|
||||||
- In practice not trainable with [[Back-Propagation|BP]]
|
- In practice not trainable with [[Back-Propagation|BP]]
|
||||||
|
|
||||||
![[activation-function.png]]
|
![[activation-function.png]]
|
||||||
|
@ -20,7 +20,7 @@ $$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$
|
|||||||
$$\hat{g}(n)=-x(n)\cdot e(n)$$
|
$$\hat{g}(n)=-x(n)\cdot e(n)$$
|
||||||
$$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$
|
$$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$
|
||||||
|
|
||||||
- Above is a feedback loop around weight vector, $\hat{w}$
|
- Above is a [[Architectures|feedforward]] loop around weight vector, $\hat{w}$
|
||||||
- Behaves like low-pass filter
|
- Behaves like low-pass filter
|
||||||
- Pass low frequency components of error signal
|
- Pass low frequency components of error signal
|
||||||
- Average time constant of filtering action inversely proportional to learning-rate
|
- Average time constant of filtering action inversely proportional to learning-rate
|
||||||
|
@ -11,7 +11,7 @@
|
|||||||
- Attention layer access all previous states and weighs according to learned measure of relevance
|
- Attention layer access all previous states and weighs according to learned measure of relevance
|
||||||
- Allows referring arbitrarily far back to relevant tokens
|
- Allows referring arbitrarily far back to relevant tokens
|
||||||
- Can be addd to [[RNN]]s
|
- Can be addd to [[RNN]]s
|
||||||
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a feedforward network
|
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
|
||||||
- Attention useful in of itself, not just with [[RNN]]s
|
- Attention useful in of itself, not just with [[RNN]]s
|
||||||
- [[Transformers]] use attention without recurrent connections
|
- [[Transformers]] use attention without recurrent connections
|
||||||
- Process all tokens simultaneously
|
- Process all tokens simultaneously
|
||||||
|
@ -35,5 +35,5 @@
|
|||||||
- Uses incorporated textual information to produce output
|
- Uses incorporated textual information to produce output
|
||||||
- Has attention to draw information from output of previous decoders before drawing from encoders
|
- Has attention to draw information from output of previous decoders before drawing from encoders
|
||||||
- Both use [[attention]]
|
- Both use [[attention]]
|
||||||
- Both use dense layers for additional processing of outputs
|
- Both use [[MLP|dense]] layers for additional processing of outputs
|
||||||
- Contain residual connections & layer norm steps
|
- Contain residual connections & layer norm steps
|
BIN
img/feedforward.png
Normal file
BIN
img/feedforward.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
BIN
img/multilayerfeedforward.png
Normal file
BIN
img/multilayerfeedforward.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 60 KiB |
BIN
img/recurrent.png
Normal file
BIN
img/recurrent.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
BIN
img/recurrentwithhn.png
Normal file
BIN
img/recurrentwithhn.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 47 KiB |
Loading…
Reference in New Issue
Block a user