vault backup: 2023-05-27 00:50:46

Affected files:
.obsidian/graph.json
.obsidian/workspace-mobile.json
.obsidian/workspace.json
STEM/AI/Neural Networks/Architectures.md
STEM/AI/Neural Networks/CNN/CNN.md
STEM/AI/Neural Networks/CNN/Examples.md
STEM/AI/Neural Networks/CNN/FCN/FCN.md
STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md
STEM/AI/Neural Networks/CNN/GAN/GAN.md
STEM/AI/Neural Networks/CNN/Interpretation.md
STEM/AI/Neural Networks/Deep Learning.md
STEM/AI/Neural Networks/MLP/MLP.md
STEM/AI/Neural Networks/SLP/Least Mean Square.md
STEM/AI/Neural Networks/Transformers/Attention.md
STEM/AI/Neural Networks/Transformers/Transformers.md
STEM/img/feedforward.png
STEM/img/multilayerfeedforward.png
STEM/img/recurrent.png
STEM/img/recurrentwithhn.png
This commit is contained in:
andy 2023-05-27 00:50:46 +01:00
parent 7052c8c915
commit acb7dc429e
16 changed files with 45 additions and 22 deletions

View File

@ -0,0 +1,23 @@
# Single-Layer Feedforward
- *Acyclic*
- Count output layer, no computation at input
![[feedforward.png]]
# Multilayer Feedforward
- Hidden layers
- Extract higher-order statistics
- Global perspective
- Helpful with large input layer
- Fully connected
- Every neuron is connected to every neuron adjacent layers
- Below is a 10-4-2 network
![[multilayerfeedforward.png]]
# Recurrent
- At least one feedback loop
- Below has no self-feedback
![[recurrent.png]]
![[recurrentwithhn.png]]
- Above has hidden neurons

View File

@ -14,14 +14,14 @@
- Double digit % gain on ImageNet accuracy - Double digit % gain on ImageNet accuracy
# Full Connected # Full Connected
Dense [[MLP|Dense]]
- Move from convolutional operations towards vector output - Move from convolutional operations towards vector output
- Stochastic drop-out - Stochastic drop-out
- Sub-sample channels and only connect some to dense layers - Sub-sample channels and only connect some to [[MLP|dense]] layers
# As a Descriptor # As a Descriptor
- Most powerful as a deeply learned feature extractor - Most powerful as a deeply learned feature extractor
- Dense classifier at the end isn't fantastic - [[MLP|Dense]] classifier at the end isn't fantastic
- Use SVM to classify prior to penultimate layer - Use SVM to classify prior to penultimate layer
![[cnn-descriptor.png]] ![[cnn-descriptor.png]]
@ -42,13 +42,13 @@ Dense
![[fine-tuning-freezing.png]] ![[fine-tuning-freezing.png]]
# Training # Training
- Validation & training loss - Validation & training [[Deep Learning#Loss Function|loss]]
- Early - Early
- Under-fitting - Under-fitting
- Training not representative - Training not representative
- Later - Later
- Overfitting - Overfitting
- V.loss can help adjust learning rate - V.[[Deep Learning#Loss Function|loss]] can help adjust learning rate
- Or indicate when to stop training - Or indicate when to stop training
![[under-over-fitting.png]] ![[under-over-fitting.png]]

View File

@ -29,13 +29,13 @@
2015 2015
- [[Inception Layer]]s - [[Inception Layer]]s
- Multiple Loss Functions - Multiple [[Deep Learning#Loss Function|Loss]] Functions
![[googlenet.png]] ![[googlenet.png]]
## [[Inception Layer]] ## [[Inception Layer]]
![[googlenet-inception.png]] ![[googlenet-inception.png]]
## Auxiliary Loss Functions ## Auxiliary [[Deep Learning#Loss Function|Loss]] Functions
- Two other SoftMax blocks - Two other SoftMax blocks
- Help train really deep network - Help train really deep network
- Vanishing gradient problem - Vanishing gradient problem

View File

@ -20,13 +20,13 @@ Contractive → [[UpConv]]
- Rarely from scratch - Rarely from scratch
- Pre-trained weights - Pre-trained weights
- Replace final layers - Replace final layers
- FC layers - [[MLP|FC]] layers
- White-noise initialised - White-noise initialised
- Add [[upconv]] layer(s) - Add [[upconv]] layer(s)
- Fine-tune train - Fine-tune train
- Freeze others - Freeze others
- Annotated GT images - Annotated GT images
- Can use summed per-pixel log loss - Can use summed per-pixel log [[Deep Learning#Loss Function|loss]]
# Evaluation # Evaluation
![[fcn-eval.png]] ![[fcn-eval.png]]

View File

@ -12,11 +12,11 @@ Deep Convolutional [[GAN]]
- Train using Gaussian random noise for code - Train using Gaussian random noise for code
- Discriminator - Discriminator
- Contractive - Contractive
- Cross-entropy loss - Cross-entropy [[Deep Learning#Loss Function|loss]]
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only - Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
- Normalised output via sigmoid - Normalised output via [[Activation Functions#Sigmoid|sigmoid]]
## Loss ## [[Deep Learning#Loss Function|Loss]]
$$D(S,L)=-\sum_iL_ilog(S_i)$$ $$D(S,L)=-\sum_iL_ilog(S_i)$$
- $S$ - $S$
- $(0.1, 0.9)^T$ - $(0.1, 0.9)^T$

View File

@ -1,7 +1,7 @@
# Fully Convolutional # Fully Convolutional
- Remove [[Max Pooling]] - Remove [[Max Pooling]]
- Use strided [[upconv]] - Use strided [[upconv]]
- Remove FC layers - Remove [[MLP|FC]] layers
- Hurts convergence in non-classification - Hurts convergence in non-classification
- Normalisation tricks - Normalisation tricks
- Batch normalisation - Batch normalisation

View File

@ -6,8 +6,8 @@
![[am.png]] ![[am.png]]
- **Use trained network** - **Use trained network**
- Don't update weights - Don't update weights
- Feedforward noise - [[Architectures|Feedforward]] noise
- [[Back-Propagation|Back-propagate]] loss - [[Back-Propagation|Back-propagate]] [[Deep Learning#Loss Function|loss]]
- Don't update weights - Don't update weights
- Update image - Update image
@ -17,4 +17,4 @@
- Prone to high frequency noise - Prone to high frequency noise
- Minimise - Minimise
- Total variation - Total variation
- $x^*$ is the best solution to minimise loss - $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]

View File

@ -8,7 +8,7 @@ Objective Function
![[deep-loss-function.png]] ![[deep-loss-function.png]]
- Test accuracy worse than train accuracy = overfitting - Test accuracy worse than train accuracy = overfitting
- Dense = fully connected - [[MLP|Dense]] = [[MLP|fully connected]]
- Automates feature engineering - Automates feature engineering
![[ml-dl.png]] ![[ml-dl.png]]

View File

@ -1,4 +1,4 @@
- Feed-forward - [[Architectures|Feedforward]]
- Single hidden layer can learn any function - Single hidden layer can learn any function
- Universal approximation theorem - Universal approximation theorem
- Each hidden layer can operate as a different feature extraction layer - Each hidden layer can operate as a different feature extraction layer
@ -8,7 +8,7 @@
![[mlp-arch.png]] ![[mlp-arch.png]]
# Universal Approximation Theory # Universal Approximation Theory
A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function
- In practice not trainable with [[Back-Propagation|BP]] - In practice not trainable with [[Back-Propagation|BP]]
![[activation-function.png]] ![[activation-function.png]]

View File

@ -20,7 +20,7 @@ $$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$
$$\hat{g}(n)=-x(n)\cdot e(n)$$ $$\hat{g}(n)=-x(n)\cdot e(n)$$
$$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$ $$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$
- Above is a feedback loop around weight vector, $\hat{w}$ - Above is a [[Architectures|feedforward]] loop around weight vector, $\hat{w}$
- Behaves like low-pass filter - Behaves like low-pass filter
- Pass low frequency components of error signal - Pass low frequency components of error signal
- Average time constant of filtering action inversely proportional to learning-rate - Average time constant of filtering action inversely proportional to learning-rate

View File

@ -11,7 +11,7 @@
- Attention layer access all previous states and weighs according to learned measure of relevance - Attention layer access all previous states and weighs according to learned measure of relevance
- Allows referring arbitrarily far back to relevant tokens - Allows referring arbitrarily far back to relevant tokens
- Can be addd to [[RNN]]s - Can be addd to [[RNN]]s
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a feedforward network - In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
- Attention useful in of itself, not just with [[RNN]]s - Attention useful in of itself, not just with [[RNN]]s
- [[Transformers]] use attention without recurrent connections - [[Transformers]] use attention without recurrent connections
- Process all tokens simultaneously - Process all tokens simultaneously

View File

@ -35,5 +35,5 @@
- Uses incorporated textual information to produce output - Uses incorporated textual information to produce output
- Has attention to draw information from output of previous decoders before drawing from encoders - Has attention to draw information from output of previous decoders before drawing from encoders
- Both use [[attention]] - Both use [[attention]]
- Both use dense layers for additional processing of outputs - Both use [[MLP|dense]] layers for additional processing of outputs
- Contain residual connections & layer norm steps - Contain residual connections & layer norm steps

BIN
img/feedforward.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

BIN
img/recurrent.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

BIN
img/recurrentwithhn.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB