vault backup: 2023-05-27 00:50:46

Affected files:
.obsidian/graph.json
.obsidian/workspace-mobile.json
.obsidian/workspace.json
STEM/AI/Neural Networks/Architectures.md
STEM/AI/Neural Networks/CNN/CNN.md
STEM/AI/Neural Networks/CNN/Examples.md
STEM/AI/Neural Networks/CNN/FCN/FCN.md
STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md
STEM/AI/Neural Networks/CNN/GAN/GAN.md
STEM/AI/Neural Networks/CNN/Interpretation.md
STEM/AI/Neural Networks/Deep Learning.md
STEM/AI/Neural Networks/MLP/MLP.md
STEM/AI/Neural Networks/SLP/Least Mean Square.md
STEM/AI/Neural Networks/Transformers/Attention.md
STEM/AI/Neural Networks/Transformers/Transformers.md
STEM/img/feedforward.png
STEM/img/multilayerfeedforward.png
STEM/img/recurrent.png
STEM/img/recurrentwithhn.png
This commit is contained in:
andy 2023-05-27 00:50:46 +01:00
parent 7052c8c915
commit acb7dc429e
16 changed files with 45 additions and 22 deletions

View File

@ -0,0 +1,23 @@
# Single-Layer Feedforward
- *Acyclic*
- Count output layer, no computation at input
![[feedforward.png]]
# Multilayer Feedforward
- Hidden layers
- Extract higher-order statistics
- Global perspective
- Helpful with large input layer
- Fully connected
- Every neuron is connected to every neuron adjacent layers
- Below is a 10-4-2 network
![[multilayerfeedforward.png]]
# Recurrent
- At least one feedback loop
- Below has no self-feedback
![[recurrent.png]]
![[recurrentwithhn.png]]
- Above has hidden neurons

View File

@ -14,14 +14,14 @@
- Double digit % gain on ImageNet accuracy
# Full Connected
Dense
[[MLP|Dense]]
- Move from convolutional operations towards vector output
- Stochastic drop-out
- Sub-sample channels and only connect some to dense layers
- Sub-sample channels and only connect some to [[MLP|dense]] layers
# As a Descriptor
- Most powerful as a deeply learned feature extractor
- Dense classifier at the end isn't fantastic
- [[MLP|Dense]] classifier at the end isn't fantastic
- Use SVM to classify prior to penultimate layer
![[cnn-descriptor.png]]
@ -42,13 +42,13 @@ Dense
![[fine-tuning-freezing.png]]
# Training
- Validation & training loss
- Validation & training [[Deep Learning#Loss Function|loss]]
- Early
- Under-fitting
- Training not representative
- Later
- Overfitting
- V.loss can help adjust learning rate
- V.[[Deep Learning#Loss Function|loss]] can help adjust learning rate
- Or indicate when to stop training
![[under-over-fitting.png]]

View File

@ -29,13 +29,13 @@
2015
- [[Inception Layer]]s
- Multiple Loss Functions
- Multiple [[Deep Learning#Loss Function|Loss]] Functions
![[googlenet.png]]
## [[Inception Layer]]
![[googlenet-inception.png]]
## Auxiliary Loss Functions
## Auxiliary [[Deep Learning#Loss Function|Loss]] Functions
- Two other SoftMax blocks
- Help train really deep network
- Vanishing gradient problem

View File

@ -20,13 +20,13 @@ Contractive → [[UpConv]]
- Rarely from scratch
- Pre-trained weights
- Replace final layers
- FC layers
- [[MLP|FC]] layers
- White-noise initialised
- Add [[upconv]] layer(s)
- Fine-tune train
- Freeze others
- Annotated GT images
- Can use summed per-pixel log loss
- Can use summed per-pixel log [[Deep Learning#Loss Function|loss]]
# Evaluation
![[fcn-eval.png]]

View File

@ -12,11 +12,11 @@ Deep Convolutional [[GAN]]
- Train using Gaussian random noise for code
- Discriminator
- Contractive
- Cross-entropy loss
- Cross-entropy [[Deep Learning#Loss Function|loss]]
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
- Normalised output via sigmoid
- Normalised output via [[Activation Functions#Sigmoid|sigmoid]]
## Loss
## [[Deep Learning#Loss Function|Loss]]
$$D(S,L)=-\sum_iL_ilog(S_i)$$
- $S$
- $(0.1, 0.9)^T$

View File

@ -1,7 +1,7 @@
# Fully Convolutional
- Remove [[Max Pooling]]
- Use strided [[upconv]]
- Remove FC layers
- Remove [[MLP|FC]] layers
- Hurts convergence in non-classification
- Normalisation tricks
- Batch normalisation

View File

@ -6,8 +6,8 @@
![[am.png]]
- **Use trained network**
- Don't update weights
- Feedforward noise
- [[Back-Propagation|Back-propagate]] loss
- [[Architectures|Feedforward]] noise
- [[Back-Propagation|Back-propagate]] [[Deep Learning#Loss Function|loss]]
- Don't update weights
- Update image
@ -17,4 +17,4 @@
- Prone to high frequency noise
- Minimise
- Total variation
- $x^*$ is the best solution to minimise loss
- $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]]

View File

@ -8,7 +8,7 @@ Objective Function
![[deep-loss-function.png]]
- Test accuracy worse than train accuracy = overfitting
- Dense = fully connected
- [[MLP|Dense]] = [[MLP|fully connected]]
- Automates feature engineering
![[ml-dl.png]]

View File

@ -1,4 +1,4 @@
- Feed-forward
- [[Architectures|Feedforward]]
- Single hidden layer can learn any function
- Universal approximation theorem
- Each hidden layer can operate as a different feature extraction layer
@ -8,7 +8,7 @@
![[mlp-arch.png]]
# Universal Approximation Theory
A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function
A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function
- In practice not trainable with [[Back-Propagation|BP]]
![[activation-function.png]]

View File

@ -20,7 +20,7 @@ $$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$
$$\hat{g}(n)=-x(n)\cdot e(n)$$
$$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$
- Above is a feedback loop around weight vector, $\hat{w}$
- Above is a [[Architectures|feedforward]] loop around weight vector, $\hat{w}$
- Behaves like low-pass filter
- Pass low frequency components of error signal
- Average time constant of filtering action inversely proportional to learning-rate

View File

@ -11,7 +11,7 @@
- Attention layer access all previous states and weighs according to learned measure of relevance
- Allows referring arbitrarily far back to relevant tokens
- Can be addd to [[RNN]]s
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a feedforward network
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
- Attention useful in of itself, not just with [[RNN]]s
- [[Transformers]] use attention without recurrent connections
- Process all tokens simultaneously

View File

@ -35,5 +35,5 @@
- Uses incorporated textual information to produce output
- Has attention to draw information from output of previous decoders before drawing from encoders
- Both use [[attention]]
- Both use dense layers for additional processing of outputs
- Both use [[MLP|dense]] layers for additional processing of outputs
- Contain residual connections & layer norm steps

BIN
img/feedforward.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

BIN
img/recurrent.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

BIN
img/recurrentwithhn.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB