diff --git a/AI/Neural Networks/Architectures.md b/AI/Neural Networks/Architectures.md new file mode 100644 index 0000000..8703826 --- /dev/null +++ b/AI/Neural Networks/Architectures.md @@ -0,0 +1,23 @@ +# Single-Layer Feedforward +- *Acyclic* +- Count output layer, no computation at input + +![[feedforward.png]] + +# Multilayer Feedforward +- Hidden layers + - Extract higher-order statistics + - Global perspective + - Helpful with large input layer +- Fully connected + - Every neuron is connected to every neuron adjacent layers +- Below is a 10-4-2 network +![[multilayerfeedforward.png]] + +# Recurrent +- At least one feedback loop +- Below has no self-feedback +![[recurrent.png]] +![[recurrentwithhn.png]] + +- Above has hidden neurons \ No newline at end of file diff --git a/AI/Neural Networks/CNN/CNN.md b/AI/Neural Networks/CNN/CNN.md index 5a8f82a..ad0d735 100644 --- a/AI/Neural Networks/CNN/CNN.md +++ b/AI/Neural Networks/CNN/CNN.md @@ -14,14 +14,14 @@ - Double digit % gain on ImageNet accuracy # Full Connected -Dense +[[MLP|Dense]] - Move from convolutional operations towards vector output - Stochastic drop-out - - Sub-sample channels and only connect some to dense layers + - Sub-sample channels and only connect some to [[MLP|dense]] layers # As a Descriptor - Most powerful as a deeply learned feature extractor -- Dense classifier at the end isn't fantastic +- [[MLP|Dense]] classifier at the end isn't fantastic - Use SVM to classify prior to penultimate layer ![[cnn-descriptor.png]] @@ -42,13 +42,13 @@ Dense ![[fine-tuning-freezing.png]] # Training -- Validation & training loss +- Validation & training [[Deep Learning#Loss Function|loss]] - Early - Under-fitting - Training not representative - Later - Overfitting -- V.loss can help adjust learning rate +- V.[[Deep Learning#Loss Function|loss]] can help adjust learning rate - Or indicate when to stop training ![[under-over-fitting.png]] \ No newline at end of file diff --git a/AI/Neural Networks/CNN/Examples.md b/AI/Neural Networks/CNN/Examples.md index b1b3e26..6b78f4f 100644 --- a/AI/Neural Networks/CNN/Examples.md +++ b/AI/Neural Networks/CNN/Examples.md @@ -29,13 +29,13 @@ 2015 - [[Inception Layer]]s -- Multiple Loss Functions +- Multiple [[Deep Learning#Loss Function|Loss]] Functions ![[googlenet.png]] ## [[Inception Layer]] ![[googlenet-inception.png]] -## Auxiliary Loss Functions +## Auxiliary [[Deep Learning#Loss Function|Loss]] Functions - Two other SoftMax blocks - Help train really deep network - Vanishing gradient problem diff --git a/AI/Neural Networks/CNN/FCN/FCN.md b/AI/Neural Networks/CNN/FCN/FCN.md index 797cf4f..969920b 100644 --- a/AI/Neural Networks/CNN/FCN/FCN.md +++ b/AI/Neural Networks/CNN/FCN/FCN.md @@ -20,13 +20,13 @@ Contractive → [[UpConv]] - Rarely from scratch - Pre-trained weights - Replace final layers - - FC layers + - [[MLP|FC]] layers - White-noise initialised - Add [[upconv]] layer(s) - Fine-tune train - Freeze others - Annotated GT images -- Can use summed per-pixel log loss +- Can use summed per-pixel log [[Deep Learning#Loss Function|loss]] # Evaluation ![[fcn-eval.png]] diff --git a/AI/Neural Networks/CNN/GAN/DC-GAN.md b/AI/Neural Networks/CNN/GAN/DC-GAN.md index c0c4b07..e096b8d 100644 --- a/AI/Neural Networks/CNN/GAN/DC-GAN.md +++ b/AI/Neural Networks/CNN/GAN/DC-GAN.md @@ -12,11 +12,11 @@ Deep Convolutional [[GAN]] - Train using Gaussian random noise for code - Discriminator - Contractive - - Cross-entropy loss + - Cross-entropy [[Deep Learning#Loss Function|loss]] - Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only - - Normalised output via sigmoid + - Normalised output via [[Activation Functions#Sigmoid|sigmoid]] -## Loss +## [[Deep Learning#Loss Function|Loss]] $$D(S,L)=-\sum_iL_ilog(S_i)$$ - $S$ - $(0.1, 0.9)^T$ diff --git a/AI/Neural Networks/CNN/GAN/GAN.md b/AI/Neural Networks/CNN/GAN/GAN.md index ac77d59..93c3553 100644 --- a/AI/Neural Networks/CNN/GAN/GAN.md +++ b/AI/Neural Networks/CNN/GAN/GAN.md @@ -1,7 +1,7 @@ # Fully Convolutional - Remove [[Max Pooling]] - Use strided [[upconv]] -- Remove FC layers +- Remove [[MLP|FC]] layers - Hurts convergence in non-classification - Normalisation tricks - Batch normalisation diff --git a/AI/Neural Networks/CNN/Interpretation.md b/AI/Neural Networks/CNN/Interpretation.md index 515dd34..081ce35 100644 --- a/AI/Neural Networks/CNN/Interpretation.md +++ b/AI/Neural Networks/CNN/Interpretation.md @@ -6,8 +6,8 @@ ![[am.png]] - **Use trained network** - Don't update weights -- Feedforward noise - - [[Back-Propagation|Back-propagate]] loss +- [[Architectures|Feedforward]] noise + - [[Back-Propagation|Back-propagate]] [[Deep Learning#Loss Function|loss]] - Don't update weights - Update image @@ -17,4 +17,4 @@ - Prone to high frequency noise - Minimise - Total variation - - $x^*$ is the best solution to minimise loss \ No newline at end of file + - $x^*$ is the best solution to minimise [[Deep Learning#Loss Function|loss]] \ No newline at end of file diff --git a/AI/Neural Networks/Deep Learning.md b/AI/Neural Networks/Deep Learning.md index 61a3e5f..d3bdbd9 100644 --- a/AI/Neural Networks/Deep Learning.md +++ b/AI/Neural Networks/Deep Learning.md @@ -8,7 +8,7 @@ Objective Function ![[deep-loss-function.png]] - Test accuracy worse than train accuracy = overfitting -- Dense = fully connected +- [[MLP|Dense]] = [[MLP|fully connected]] - Automates feature engineering ![[ml-dl.png]] diff --git a/AI/Neural Networks/MLP/MLP.md b/AI/Neural Networks/MLP/MLP.md index 2d65cbd..21aedc6 100644 --- a/AI/Neural Networks/MLP/MLP.md +++ b/AI/Neural Networks/MLP/MLP.md @@ -1,4 +1,4 @@ -- Feed-forward +- [[Architectures|Feedforward]] - Single hidden layer can learn any function - Universal approximation theorem - Each hidden layer can operate as a different feature extraction layer @@ -8,7 +8,7 @@ ![[mlp-arch.png]] # Universal Approximation Theory -A finite feed-forward MLP with 1 hidden layer can in theory approximate any mathematical function +A finite [[Architectures|feedforward]] MLP with 1 hidden layer can in theory approximate any mathematical function - In practice not trainable with [[Back-Propagation|BP]] ![[activation-function.png]] diff --git a/AI/Neural Networks/SLP/Least Mean Square.md b/AI/Neural Networks/SLP/Least Mean Square.md index 5e7de49..9e45812 100644 --- a/AI/Neural Networks/SLP/Least Mean Square.md +++ b/AI/Neural Networks/SLP/Least Mean Square.md @@ -20,7 +20,7 @@ $$\frac{\partial \mathfrak{E}(w)}{\partial w(n)}=-x(n)\cdot e(n)$$ $$\hat{g}(n)=-x(n)\cdot e(n)$$ $$\hat{w}(n+1)=\hat{w}(n)+\eta \cdot x(n) \cdot e(n)$$ -- Above is a feedback loop around weight vector, $\hat{w}$ +- Above is a [[Architectures|feedforward]] loop around weight vector, $\hat{w}$ - Behaves like low-pass filter - Pass low frequency components of error signal - Average time constant of filtering action inversely proportional to learning-rate diff --git a/AI/Neural Networks/Transformers/Attention.md b/AI/Neural Networks/Transformers/Attention.md index dbb611b..ba7c9ca 100644 --- a/AI/Neural Networks/Transformers/Attention.md +++ b/AI/Neural Networks/Transformers/Attention.md @@ -11,7 +11,7 @@ - Attention layer access all previous states and weighs according to learned measure of relevance - Allows referring arbitrarily far back to relevant tokens - Can be addd to [[RNN]]s -- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a feedforward network +- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network - Attention useful in of itself, not just with [[RNN]]s - [[Transformers]] use attention without recurrent connections - Process all tokens simultaneously diff --git a/AI/Neural Networks/Transformers/Transformers.md b/AI/Neural Networks/Transformers/Transformers.md index 4cddb84..5c4e8ec 100644 --- a/AI/Neural Networks/Transformers/Transformers.md +++ b/AI/Neural Networks/Transformers/Transformers.md @@ -35,5 +35,5 @@ - Uses incorporated textual information to produce output - Has attention to draw information from output of previous decoders before drawing from encoders - Both use [[attention]] -- Both use dense layers for additional processing of outputs +- Both use [[MLP|dense]] layers for additional processing of outputs - Contain residual connections & layer norm steps \ No newline at end of file diff --git a/img/feedforward.png b/img/feedforward.png new file mode 100644 index 0000000..4b7f456 Binary files /dev/null and b/img/feedforward.png differ diff --git a/img/multilayerfeedforward.png b/img/multilayerfeedforward.png new file mode 100644 index 0000000..c3004ab Binary files /dev/null and b/img/multilayerfeedforward.png differ diff --git a/img/recurrent.png b/img/recurrent.png new file mode 100644 index 0000000..7b0ed82 Binary files /dev/null and b/img/recurrent.png differ diff --git a/img/recurrentwithhn.png b/img/recurrentwithhn.png new file mode 100644 index 0000000..88a44d4 Binary files /dev/null and b/img/recurrentwithhn.png differ