vault backup: 2023-05-27 00:50:46
Affected files: .obsidian/graph.json .obsidian/workspace-mobile.json .obsidian/workspace.json STEM/AI/Neural Networks/Architectures.md STEM/AI/Neural Networks/CNN/CNN.md STEM/AI/Neural Networks/CNN/Examples.md STEM/AI/Neural Networks/CNN/FCN/FCN.md STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md STEM/AI/Neural Networks/CNN/GAN/GAN.md STEM/AI/Neural Networks/CNN/Interpretation.md STEM/AI/Neural Networks/Deep Learning.md STEM/AI/Neural Networks/MLP/MLP.md STEM/AI/Neural Networks/SLP/Least Mean Square.md STEM/AI/Neural Networks/Transformers/Attention.md STEM/AI/Neural Networks/Transformers/Transformers.md STEM/img/feedforward.png STEM/img/multilayerfeedforward.png STEM/img/recurrent.png STEM/img/recurrentwithhn.png
This commit is contained in:
parent
7052c8c915
commit
acb7dc429e
23
AI/Neural Networks/Architectures.md
Normal file
23
AI/Neural Networks/Architectures.md
Normal file
@ -0,0 +1,23 @@
|
||||
# Single-Layer Feedforward
|
||||
- *Acyclic*
|
||||
- Count output layer, no computation at input
|
||||
|
||||
![[feedforward.png]]
|
||||
|
||||
# Multilayer Feedforward
|
||||
- Hidden layers
|
||||
- Extract higher-order statistics
|
||||
- Global perspective
|
||||
- Helpful with large input layer
|
||||
- Fully connected
|
||||
- Every neuron is connected to every neuron adjacent layers
|
||||
- Below is a 10-4-2 network
|
||||
![[multilayerfeedforward.png]]
|
||||
|
||||
# Recurrent
|
||||
- At least one feedback loop
|
||||
- Below has no self-feedback
|
||||
![[recurrent.png]]
|
||||
![[recurrentwithhn.png]]
|
||||
|
||||
- Above has hidden neurons
|
@ -14,14 +14,14 @@
|
||||
- Double digit % gain on ImageNet accuracy
|
||||
|
||||
# Full Connected
|
||||
Dense
|
||||
[[MLP|Dense]]
|
||||
- Move from convolutional operations towards vector output
|
||||
- Stochastic drop-out
|
||||
- Sub-sample channels and only connect some to dense layers
|
||||
- Sub-sample channels and only connect some to [[MLP|dense]] layers
|
||||
|
||||
# As a Descriptor
|
||||
- Most powerful as a deeply learned feature extractor
|
||||
- Dense classifier at the end isn't fantastic
|
||||
- [[MLP|Dense]] classifier at the end isn't fantastic
|
||||
- Use SVM to classify prior to penultimate layer
|
||||
|
||||
![[cnn-descriptor.png]]
|
||||
@ -42,13 +42,13 @@ Dense
|
||||
|
||||
![[fine-tuning-freezing.png]]
|
||||
# Training
|
||||
- Validation & training loss
|
||||
- Validation & training [[Deep Learning#Loss Function|loss]]
|
||||
- Early
|
||||
- Under-fitting
|
||||
- Training not representative
|
||||
- Later
|
||||
- Overfitting
|
||||
- V.loss can help adjust learning rate
|
||||
- V.[[Deep Learning#Loss Function|loss]] can help adjust learning rate
|
||||
- Or indicate when to stop training
|
||||
|
||||
![[under-over-fitting.png]]
|
@ -29,13 +29,13 @@
|
||||
2015
|
||||
|
||||
- [[Inception Layer]]s
|
||||
- Multiple Loss Functions
|
||||
- Multiple [[Deep Learning#Loss Function|Loss]] Functions
|
||||
|
||||
![[googlenet.png]]
|
||||
|
||||
## [[Inception Layer]]
|
||||
![[googlenet-inception.png]]
|
||||
## Auxiliary Loss Functions
|
||||
## Auxiliary [[Deep Learning#Loss Function|Loss]] Functions
|
||||
- Two other SoftMax blocks
|
||||
- Help train really deep network
|
||||
- Vanishing gradient problem
|
||||
|
@ -20,13 +20,13 @@ Contractive → [[UpConv]]
|
||||
- Rarely from scratch
|
||||
- Pre-trained weights
|
||||
- Replace final layers
|
||||
- FC layers
|
||||
- [[MLP|FC]] layers
|
||||
- White-noise initialised
|
||||
- Add [[upconv]] layer(s)
|
||||
- Fine-tune train
|
||||
- Freeze others
|
||||
- Annotated GT images
|
||||
- Can use summed per-pixel log loss
|
||||
- Can use summed per-pixel log [[Deep Learning#Loss Function|loss]]
|
||||
|
||||
# Evaluation
|
||||
![[fcn-eval.png]]
|
||||
|
@ -12,11 +12,11 @@ Deep Convolutional [[GAN]]
|
||||
- Train using Gaussian random noise for code
|
||||
- Discriminator
|
||||
- Contractive
|
||||
- Cross-entropy loss
|
||||
- Cross-entropy [[Deep Learning#Loss Function|loss]]
|
||||
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
|
||||
- Normalised output via sigmoid
|
||||
- Normalised output via [[Activation Functions#Sigmoid|sigmoid]]
|
||||
|
||||
## Loss
|
||||
## [[Deep Learning#Loss Function|Loss]]
|
||||
$$D(S,L)=-\sum_iL_ilog(S_i)$$
|
||||
- $S$
|
||||
- $(0.1, 0.9)^T$
|
||||
|
@ -1,7 +1,7 @@
|
||||
# Fully Convolutional
|
||||
- Remove [[Max Pooling]]
|
||||
- Use strided [[upconv]]
|
||||
- Remove FC layers
|
||||
- Remove [[MLP|FC]] layers
|
||||
- Hurts convergence in non-classification
|
||||
- Normalisation tricks
|
||||
- Batch normalisation
|
||||
|
@ -6,8 +6,8 @@
|
||||
![[am.png]]
|
||||
- **Use trained network**
|
||||
- Don't update weights
|
||||
- Feedforward noise
|
||||
- [[Back-Propagation|Back-propagate]] loss
|
||||
- [[Architectures|Feedforward]] noise
|
||||
- [[Back-Propagation|Back-propagate]] [[Deep Learning#Loss Function|loss]]
|
||||
- Don't update weights
|
||||
- Update image
|
||||
|
||||
@ -17,4 +17,4 @@
|
||||
- Prone to high frequency noise
|
||||
- Minimise
|
||||
- Total variation
|
||||
- $x^*$ is the best solution to minimise loss
|
||||
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]
|
@ -8,7 +8,7 @@ Objective Function
|
||||
![[deep-loss-function.png]]
|
||||
|
||||
- Test accuracy worse than train accuracy = overfitting
|
||||
- Dense = fully connected
|
||||
- [[MLP|Dense]] = [[MLP|fully connected]]
|
||||
- Automates feature engineering
|
||||
|
||||
![[ml-dl.png]]
|
||||
|
@ -1,4 +1,4 @@
|
||||
- Feed-forward
|
||||
- [[Architectures|Feedforward]]
|
||||
- Single hidden layer can learn any function
|
||||
- Universal approximation theorem
|
||||
- Each hidden layer can operate as a different feature extraction layer
|
||||
@ -8,7 +8,7 @@
|
||||
![[mlp-arch.png]]
|
||||
|
||||
# Universal Approximation Theory
|
||||
A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function
|
||||
A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function
|
||||
- In practice not trainable with [[Back-Propagation|BP]]
|
||||
|
||||
![[activation-function.png]]
|
||||
|
@ -20,7 +20,7 @@ $$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$
|
||||
$$\hat{g}(n)=-x(n)\cdot e(n)$$
|
||||
$$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$
|
||||
|
||||
- Above is a feedback loop around weight vector, $\hat{w}$
|
||||
- Above is a [[Architectures|feedforward]] loop around weight vector, $\hat{w}$
|
||||
- Behaves like low-pass filter
|
||||
- Pass low frequency components of error signal
|
||||
- Average time constant of filtering action inversely proportional to learning-rate
|
||||
|
@ -11,7 +11,7 @@
|
||||
- Attention layer access all previous states and weighs according to learned measure of relevance
|
||||
- Allows referring arbitrarily far back to relevant tokens
|
||||
- Can be addd to [[RNN]]s
|
||||
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a feedforward network
|
||||
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
|
||||
- Attention useful in of itself, not just with [[RNN]]s
|
||||
- [[Transformers]] use attention without recurrent connections
|
||||
- Process all tokens simultaneously
|
||||
|
@ -35,5 +35,5 @@
|
||||
- Uses incorporated textual information to produce output
|
||||
- Has attention to draw information from output of previous decoders before drawing from encoders
|
||||
- Both use [[attention]]
|
||||
- Both use dense layers for additional processing of outputs
|
||||
- Both use [[MLP|dense]] layers for additional processing of outputs
|
||||
- Contain residual connections & layer norm steps
|
BIN
img/feedforward.png
Normal file
BIN
img/feedforward.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
BIN
img/multilayerfeedforward.png
Normal file
BIN
img/multilayerfeedforward.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 60 KiB |
BIN
img/recurrent.png
Normal file
BIN
img/recurrent.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
BIN
img/recurrentwithhn.png
Normal file
BIN
img/recurrentwithhn.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 47 KiB |
Loading…
Reference in New Issue
Block a user