vault backup: 2023-05-26 18:29:17
Affected files: .obsidian/graph.json .obsidian/workspace-mobile.json .obsidian/workspace.json STEM/AI/Neural Networks/Activation Functions.md STEM/AI/Neural Networks/CNN/CNN.md STEM/AI/Neural Networks/CNN/Convolutional Layer.md STEM/AI/Neural Networks/CNN/Examples.md STEM/AI/Neural Networks/CNN/GAN/CycleGAN.md STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md STEM/AI/Neural Networks/CNN/GAN/GAN.md STEM/AI/Neural Networks/CNN/GAN/StackGAN.md STEM/AI/Neural Networks/CNN/GAN/cGAN.md STEM/AI/Neural Networks/CNN/Inception Layer.md STEM/AI/Neural Networks/CNN/Max Pooling.md STEM/AI/Neural Networks/CNN/Normalisation.md STEM/AI/Neural Networks/CV/Data Manipulations.md STEM/AI/Neural Networks/CV/Datasets.md STEM/AI/Neural Networks/CV/Filters.md STEM/AI/Neural Networks/CV/Layer Structure.md STEM/AI/Neural Networks/Weight Init.md STEM/img/alexnet.png STEM/img/cgan-example.png STEM/img/cgan.png STEM/img/cnn-cv-layer-arch.png STEM/img/cnn-descriptor.png STEM/img/cnn-normalisation.png STEM/img/code-vector-math-for-control-results.png STEM/img/cvmfc.png STEM/img/cyclegan-results.png STEM/img/cyclegan.png STEM/img/data-aug.png STEM/img/data-whitening.png STEM/img/dc-gan.png STEM/img/fine-tuning-freezing.png STEM/img/gabor.png STEM/img/gan-arch.png STEM/img/gan-arch2.png STEM/img/gan-results.png STEM/img/gan-training-discriminator.png STEM/img/gan-training-generator.png STEM/img/googlenet-auxilliary-loss.png STEM/img/googlenet-inception.png STEM/img/googlenet.png STEM/img/icv-pos-neg-examples.png STEM/img/icv-results.png STEM/img/inception-layer-arch.png STEM/img/inception-layer-effect.png STEM/img/lenet-1989.png STEM/img/lenet-1998.png STEM/img/max-pooling.png STEM/img/stackgan-results.png STEM/img/stackgan.png STEM/img/under-over-fitting.png STEM/img/vgg-arch.png STEM/img/vgg-spec.png STEM/img/word2vec.png
@ -52,5 +52,17 @@ $$\frac{dy}{dx}=
|
|||||||
Rectilinear
|
Rectilinear
|
||||||
- For deep networks
|
- For deep networks
|
||||||
- $y=max(0,x)$
|
- $y=max(0,x)$
|
||||||
|
- CNNs
|
||||||
|
- Breaks associativity of successive convolutions
|
||||||
|
- Critical for learning complex functions
|
||||||
|
- Sometimes small scalar for negative
|
||||||
|
- Leaky ReLu
|
||||||
|
|
||||||
![[relu.png]]
|
![[relu.png]]
|
||||||
|
|
||||||
|
# SoftMax
|
||||||
|
- Output is per-class vector of likelihoods
|
||||||
|
- Should be normalised into probability vector
|
||||||
|
|
||||||
|
## AlexNet
|
||||||
|
$$f(x_i)=\frac{\text{exp}(x_i)}{\sum_{j=1}^{1000}\text{exp}(x_j)}$$
|
@ -0,0 +1,54 @@
|
|||||||
|
## Before 2010s
|
||||||
|
- Data hungry
|
||||||
|
- Need lots of training data
|
||||||
|
- Processing power
|
||||||
|
- Niche
|
||||||
|
- No-one cared/knew about CNNs
|
||||||
|
## After
|
||||||
|
- ImageNet
|
||||||
|
- 16m images, 1000 classes
|
||||||
|
- GPUs
|
||||||
|
- General processing GPUs
|
||||||
|
- CUDA
|
||||||
|
- NIPS/ECCV 2012
|
||||||
|
- Double digit % gain on ImageNet accuracy
|
||||||
|
|
||||||
|
# Full Connected
|
||||||
|
Dense
|
||||||
|
- Move from convolutional operations towards vector output
|
||||||
|
- Stochastic drop-out
|
||||||
|
- Sub-sample channels and only connect some to dense layers
|
||||||
|
|
||||||
|
# As a Descriptor
|
||||||
|
- Most powerful as a deeply learned feature extractor
|
||||||
|
- Dense classifier at the end isn't fantastic
|
||||||
|
- Use SVM to classify prior to penultimate layer
|
||||||
|
|
||||||
|
![[cnn-descriptor.png]]
|
||||||
|
|
||||||
|
# Finetuning
|
||||||
|
- Observations
|
||||||
|
- Most CNNs have similar weights in conv1
|
||||||
|
- Most useful CNNs have several conv layers
|
||||||
|
- Many weights
|
||||||
|
- Lots of training data
|
||||||
|
- Training data is hard to get
|
||||||
|
- Labelling
|
||||||
|
- Reuse weights from other network
|
||||||
|
- Freeze weights in first 3-5 conv layers
|
||||||
|
- Learning rate = 0
|
||||||
|
- Randomly initialise remaining layers
|
||||||
|
- Continue with existing weights
|
||||||
|
|
||||||
|
![[fine-tuning-freezing.png]]
|
||||||
|
# Training
|
||||||
|
- Validation & training loss
|
||||||
|
- Early
|
||||||
|
- Under-fitting
|
||||||
|
- Training not representative
|
||||||
|
- Later
|
||||||
|
- Overfitting
|
||||||
|
- V.loss can help adjust learning rate
|
||||||
|
- Or indicate when to stop training
|
||||||
|
|
||||||
|
![[under-over-fitting.png]]
|
25
AI/Neural Networks/CNN/Convolutional Layer.md
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
|
||||||
|
## Design Parameters
|
||||||
|
- Size of input image
|
||||||
|
- 256 x 256 x 1
|
||||||
|
- Towards top end of supportable
|
||||||
|
- Padding
|
||||||
|
- Thickness of border 0s
|
||||||
|
- Kernel size
|
||||||
|
- 7 x 7 x 1 x n
|
||||||
|
- N is for multiple filters per layer
|
||||||
|
- Main design decision
|
||||||
|
- 12 x 12/15 x 15 in early layers
|
||||||
|
- Lower in later filters
|
||||||
|
- Dataset-dependent
|
||||||
|
- Stride
|
||||||
|
- Interval to sample
|
||||||
|
- 1
|
||||||
|
- Every subsequent pixel
|
||||||
|
- Same size out as in
|
||||||
|
- 2
|
||||||
|
- Every other subsequent pixel
|
||||||
|
- Out image is half input size
|
||||||
|
- Size of computable output
|
||||||
|
- 252 x 252 x 1 x n
|
||||||
|
- Depends on padding and striding
|
43
AI/Neural Networks/CNN/Examples.md
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
# LeNet
|
||||||
|
- 1990's
|
||||||
|
![[lenet-1989.png]]
|
||||||
|
- 1989
|
||||||
|
![[lenet-1998.png]]
|
||||||
|
- 1998
|
||||||
|
|
||||||
|
# AlexNet
|
||||||
|
2012
|
||||||
|
|
||||||
|
- [[Activation Functions#ReLu|ReLu]]
|
||||||
|
- Normalisation
|
||||||
|
|
||||||
|
![[alexnet.png]]
|
||||||
|
|
||||||
|
# VGG
|
||||||
|
2015
|
||||||
|
|
||||||
|
- 16 layers over AlexNet's 8
|
||||||
|
- Looking at vanishing gradient problem
|
||||||
|
- Xavier
|
||||||
|
- Similar kernel size throughout
|
||||||
|
- Gradual filter increase
|
||||||
|
|
||||||
|
![[vgg-spec.png]]
|
||||||
|
![[vgg-arch.png]]
|
||||||
|
|
||||||
|
# GoogLeNet
|
||||||
|
2015
|
||||||
|
|
||||||
|
- [[Inception Layer]]s
|
||||||
|
- Multiple Loss Functions
|
||||||
|
|
||||||
|
![[googlenet.png]]
|
||||||
|
|
||||||
|
## [[Inception Layer]]
|
||||||
|
![[googlenet-inception.png]]
|
||||||
|
## Auxiliary Loss Functions
|
||||||
|
- Two other SoftMax blocks
|
||||||
|
- Help train really deep network
|
||||||
|
- Vanishing gradient problem
|
||||||
|
|
||||||
|
![[googlenet-auxilliary-loss.png]]
|
22
AI/Neural Networks/CNN/GAN/CycleGAN.md
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
Cycle Consistent GAN
|
||||||
|
|
||||||
|
- G
|
||||||
|
- $x \rightarrow y$
|
||||||
|
- F
|
||||||
|
- $y \rightarrow x$
|
||||||
|
- Aims to bridge gap across domains
|
||||||
|
- Zebras-horses
|
||||||
|
- Audi-BMW
|
||||||
|
- Learn bidirectional mapping function
|
||||||
|
- Transitivity regularises training
|
||||||
|
- $x \rightarrow y'$
|
||||||
|
- $y' \rightarrow x''$
|
||||||
|
- $x == x''$
|
||||||
|
- Cycle consistency
|
||||||
|
- Requires two datasets
|
||||||
|
- One for each domain
|
||||||
|
- Not directly paired
|
||||||
|
- Unlike edge map $\rightarrow$ bag
|
||||||
|
|
||||||
|
![[cyclegan.png]]
|
||||||
|
![[cyclegan-results.png]]
|
69
AI/Neural Networks/CNN/GAN/DC-GAN.md
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
Deep Convolutional GAN
|
||||||
|
![[dc-gan.png]]
|
||||||
|
|
||||||
|
- Generator
|
||||||
|
- FCN
|
||||||
|
- Decoder
|
||||||
|
- Generate image from code
|
||||||
|
- Low-dimensional
|
||||||
|
- ~100-D
|
||||||
|
- Reshape to tensor
|
||||||
|
- Upconv to image
|
||||||
|
- Train using Gaussian random noise for code
|
||||||
|
- Discriminator
|
||||||
|
- Contractive
|
||||||
|
- Cross-entropy loss
|
||||||
|
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
|
||||||
|
- Normalised output via sigmoid
|
||||||
|
|
||||||
|
## Loss
|
||||||
|
$$D(S,L)=-\sum_iL_ilog(S_i)$$
|
||||||
|
- $S$
|
||||||
|
- $(0.1, 0.9)^T$
|
||||||
|
- Score generated by discriminator
|
||||||
|
- $L$
|
||||||
|
- $(1, 0)^T$
|
||||||
|
- One-hot label vector
|
||||||
|
- Step 1
|
||||||
|
- Depends on choice of real/fake
|
||||||
|
- Step 2
|
||||||
|
- One-hot fake vector
|
||||||
|
- $\sum_i$
|
||||||
|
- Sum over all images in mini-batch
|
||||||
|
|
||||||
|
| Noise | Image |
|
||||||
|
| ----- | ----- |
|
||||||
|
| $z$ | $x$ |
|
||||||
|
|
||||||
|
- Generator wants
|
||||||
|
- $D(G(z))=1$
|
||||||
|
- Wants to fool discriminator
|
||||||
|
- Discriminator wants
|
||||||
|
- $D(G(z))=0$
|
||||||
|
- Wants to correctly catch generator
|
||||||
|
- Real data wants
|
||||||
|
- $D(x)=1$
|
||||||
|
|
||||||
|
$$J^{(D)}=-\frac 1 2 \mathbb E_{x\sim p_{data}}\log D(x)-\frac 1 2 \mathbb E_z\log (1-D(G(z)))$$
|
||||||
|
$$J^{(G)}=-J^{(D)}$$
|
||||||
|
- First term for real images
|
||||||
|
- Second term for fake images
|
||||||
|
|
||||||
|
# Mode Collapse
|
||||||
|
- Generator gives easy solution
|
||||||
|
- Learns one image for most noise that will fool discriminator
|
||||||
|
- Mitigate by minibatch discriminator
|
||||||
|
- Match G(z) distribution to x
|
||||||
|
|
||||||
|
# What is Learnt?
|
||||||
|
- Encoding texture/patch detail from training set
|
||||||
|
- Similar to FCN
|
||||||
|
- Reproducing texture at high level
|
||||||
|
- Cues triggered by code vector
|
||||||
|
- Input random noise
|
||||||
|
- Iteratively improves visual feasibility
|
||||||
|
- Different to FCN
|
||||||
|
- Discriminator is a task specific classifier
|
||||||
|
- Difficult to train over diverse footage
|
||||||
|
- Mixing concepts doesn't work
|
||||||
|
- Single category/class
|
31
AI/Neural Networks/CNN/GAN/GAN.md
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
# Fully Convolutional
|
||||||
|
- Remove max-pooling
|
||||||
|
- Use strided upconv
|
||||||
|
- Remove FC layers
|
||||||
|
- Hurts convergence in non-classification
|
||||||
|
- Normalisation tricks
|
||||||
|
- Batch normalisation
|
||||||
|
- Batches of 0 mean and variance 1
|
||||||
|
- Leaky ReLu
|
||||||
|
|
||||||
|
# Stages
|
||||||
|
## Generator, G
|
||||||
|
- Synthesise 'fake' images
|
||||||
|
- From noise
|
||||||
|
## Discriminator, D
|
||||||
|
- Discriminator is a classifier
|
||||||
|
- Is image fake or real
|
||||||
|
|
||||||
|
![[gan-arch.png]]
|
||||||
|
![[gan-arch2.png]]
|
||||||
|
|
||||||
|
![[gan-results.png]]
|
||||||
|
|
||||||
|
# Training
|
||||||
|
![[gan-training-discriminator.png]]
|
||||||
|
![[gan-training-generator.png]]
|
||||||
|
|
||||||
|
# Code Vector Math for Control
|
||||||
|
![[cvmfc.png]]
|
||||||
|
- Do AM to derive code for an image
|
||||||
|
![[code-vector-math-for-control-results.png]]
|
6
AI/Neural Networks/CNN/GAN/StackGAN.md
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
- Feed output from synthesis into up-res network
|
||||||
|
- Generate standard low-res image
|
||||||
|
- Feed into [[cGAN]]
|
||||||
|
|
||||||
|
![[stackgan.png]]
|
||||||
|
![[stackgan-results.png]]
|
23
AI/Neural Networks/CNN/GAN/cGAN.md
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
Conditional GAN
|
||||||
|
|
||||||
|
- Hard to control with AM
|
||||||
|
- Unconditional GAN
|
||||||
|
- Condition synthesis on a class label
|
||||||
|
- Concatenate unconditional code with conditioning vector
|
||||||
|
- Label
|
||||||
|
- No longer unsupervised
|
||||||
|
- Everything labelled
|
||||||
|
- Fake images and dataset
|
||||||
|
- **Requires pairing**
|
||||||
|
|
||||||
|
![[cgan.png]]
|
||||||
|
![[cgan-example.png]]
|
||||||
|
|
||||||
|
# Image Conditioning Vector
|
||||||
|
![[icv-pos-neg-examples.png]]
|
||||||
|
![[icv-results.png]]
|
||||||
|
|
||||||
|
# Text Encoding
|
||||||
|
- word2vec
|
||||||
|
|
||||||
|
![[word2vec.png]]
|
14
AI/Neural Networks/CNN/Inception Layer.md
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
- Similar to band-pass pyramid
|
||||||
|
- Changes fixed scale window sizes
|
||||||
|
- Couple of different scales
|
||||||
|
- Concatenate results
|
||||||
|
|
||||||
|
![[inception-layer-effect.png]]
|
||||||
|
![[inception-layer-arch.png]]
|
||||||
|
|
||||||
|
- 1 x 1
|
||||||
|
- Averages over channels
|
||||||
|
- Bottleneck layer
|
||||||
|
- Reduces computation
|
||||||
|
- x 10
|
||||||
|
- Shrinks number of filters
|
26
AI/Neural Networks/CNN/Max Pooling.md
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
- Maximum within window and writes result to output
|
||||||
|
- Downsamples image
|
||||||
|
- More non-linearity
|
||||||
|
- Doesn't remove important information
|
||||||
|
- Max value is the good bit
|
||||||
|
- No parameters
|
||||||
|
|
||||||
|
![[max-pooling.png]]
|
||||||
|
|
||||||
|
## Design Parameters
|
||||||
|
- Size of input image
|
||||||
|
- 252 x 252 x 1 x n
|
||||||
|
- Padding
|
||||||
|
- Kernel size
|
||||||
|
- 3 x 3 x 1
|
||||||
|
- Doesn't need to be odd
|
||||||
|
- 2 x 2
|
||||||
|
- Stride
|
||||||
|
- Typically n
|
||||||
|
- For n x n kernel size
|
||||||
|
- Sometimes 4 x 4 in early layers
|
||||||
|
- 16 times less data
|
||||||
|
- Rapid downsample
|
||||||
|
- Size of computable output
|
||||||
|
- 250 x 250 x 1 x n
|
||||||
|
- Depends on padding and striding
|
5
AI/Neural Networks/CNN/Normalisation.md
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
- To keep sensible layer by layer
|
||||||
|
- Apply kernel to same location of all channels
|
||||||
|
- Pixels in window divided by sum of pixel within volume across channels
|
||||||
|
|
||||||
|
![[cnn-normalisation.png]]
|
11
AI/Neural Networks/CV/Data Manipulations.md
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
# Augmentation
|
||||||
|
- Mimic larger datasets
|
||||||
|
- Help with over-fitting
|
||||||
|
|
||||||
|
![[data-aug.png]]
|
||||||
|
|
||||||
|
# Data Whitening
|
||||||
|
- Remove average image of dataset
|
||||||
|
- Or average RGB pixel from all
|
||||||
|
|
||||||
|
![[data-whitening.png]]
|
23
AI/Neural Networks/CV/Datasets.md
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
# MNIST
|
||||||
|
- 70,000 hand-drawn characters from US mail
|
||||||
|
- 28x28 images
|
||||||
|
- 10 classes (0 through 9)
|
||||||
|
- Achieved 99.83%
|
||||||
|
- Ciresan et al. 2011
|
||||||
|
|
||||||
|
# CIFAR-10
|
||||||
|
- 60,000 colour images
|
||||||
|
- 32x32 images
|
||||||
|
- 10 classes
|
||||||
|
- Airplane
|
||||||
|
- Automobile
|
||||||
|
- Bird
|
||||||
|
- Cat
|
||||||
|
- Deer
|
||||||
|
- Dog
|
||||||
|
- Frog
|
||||||
|
- Horse
|
||||||
|
- Ship
|
||||||
|
- Truck
|
||||||
|
- Achieved 90.7%
|
||||||
|
- Wan et al. 2013
|
2
AI/Neural Networks/CV/Filters.md
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
# Gabor
|
||||||
|
![[gabor.png]]
|
1
AI/Neural Networks/CV/Layer Structure.md
Normal file
@ -0,0 +1 @@
|
|||||||
|
![[cnn-cv-layer-arch.png]]
|
18
AI/Neural Networks/Weight Init.md
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
- Randomly
|
||||||
|
- Gaussian noise with mean = 0
|
||||||
|
- Small network
|
||||||
|
- Fixed sigma is fine
|
||||||
|
- 0.01
|
||||||
|
- E.g. 8 layers
|
||||||
|
- AlexNet
|
||||||
|
- Too large
|
||||||
|
- Wont converge
|
||||||
|
- Too small
|
||||||
|
- Gradient wont propagate back many layers
|
||||||
|
|
||||||
|
## Xavier System
|
||||||
|
$$\sigma=\frac 1 {n_{in}+n_{out}}$$
|
||||||
|
or
|
||||||
|
$$\sigma=\sqrt{2/n}$$
|
||||||
|
* Where $n=\text{filter size}\times n_{out}$
|
||||||
|
* And $n_{in}$ and $n_{out}$ refer to number of image channels in and out of the layer
|
BIN
img/alexnet.png
Normal file
After Width: | Height: | Size: 48 KiB |
BIN
img/cgan-example.png
Normal file
After Width: | Height: | Size: 76 KiB |
BIN
img/cgan.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
img/cnn-cv-layer-arch.png
Normal file
After Width: | Height: | Size: 251 KiB |
BIN
img/cnn-descriptor.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
img/cnn-normalisation.png
Normal file
After Width: | Height: | Size: 4.7 KiB |
BIN
img/code-vector-math-for-control-results.png
Normal file
After Width: | Height: | Size: 181 KiB |
BIN
img/cvmfc.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
img/cyclegan-results.png
Normal file
After Width: | Height: | Size: 322 KiB |
BIN
img/cyclegan.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
img/data-aug.png
Normal file
After Width: | Height: | Size: 146 KiB |
BIN
img/data-whitening.png
Normal file
After Width: | Height: | Size: 248 KiB |
BIN
img/dc-gan.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
img/fine-tuning-freezing.png
Normal file
After Width: | Height: | Size: 68 KiB |
BIN
img/gabor.png
Normal file
After Width: | Height: | Size: 65 KiB |
BIN
img/gan-arch.png
Normal file
After Width: | Height: | Size: 36 KiB |
BIN
img/gan-arch2.png
Normal file
After Width: | Height: | Size: 61 KiB |
BIN
img/gan-results.png
Normal file
After Width: | Height: | Size: 639 KiB |
BIN
img/gan-training-discriminator.png
Normal file
After Width: | Height: | Size: 187 KiB |
BIN
img/gan-training-generator.png
Normal file
After Width: | Height: | Size: 129 KiB |
BIN
img/googlenet-auxilliary-loss.png
Normal file
After Width: | Height: | Size: 100 KiB |
BIN
img/googlenet-inception.png
Normal file
After Width: | Height: | Size: 150 KiB |
BIN
img/googlenet.png
Normal file
After Width: | Height: | Size: 117 KiB |
BIN
img/icv-pos-neg-examples.png
Normal file
After Width: | Height: | Size: 182 KiB |
BIN
img/icv-results.png
Normal file
After Width: | Height: | Size: 390 KiB |
BIN
img/inception-layer-arch.png
Normal file
After Width: | Height: | Size: 87 KiB |
BIN
img/inception-layer-effect.png
Normal file
After Width: | Height: | Size: 135 KiB |
BIN
img/lenet-1989.png
Normal file
After Width: | Height: | Size: 94 KiB |
BIN
img/lenet-1998.png
Normal file
After Width: | Height: | Size: 102 KiB |
BIN
img/max-pooling.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
img/stackgan-results.png
Normal file
After Width: | Height: | Size: 247 KiB |
BIN
img/stackgan.png
Normal file
After Width: | Height: | Size: 127 KiB |
BIN
img/under-over-fitting.png
Normal file
After Width: | Height: | Size: 178 KiB |
BIN
img/vgg-arch.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
img/vgg-spec.png
Normal file
After Width: | Height: | Size: 87 KiB |
BIN
img/word2vec.png
Normal file
After Width: | Height: | Size: 223 KiB |