vault backup: 2023-05-26 18:29:17

Affected files:
.obsidian/graph.json
.obsidian/workspace-mobile.json
.obsidian/workspace.json
STEM/AI/Neural Networks/Activation Functions.md
STEM/AI/Neural Networks/CNN/CNN.md
STEM/AI/Neural Networks/CNN/Convolutional Layer.md
STEM/AI/Neural Networks/CNN/Examples.md
STEM/AI/Neural Networks/CNN/GAN/CycleGAN.md
STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md
STEM/AI/Neural Networks/CNN/GAN/GAN.md
STEM/AI/Neural Networks/CNN/GAN/StackGAN.md
STEM/AI/Neural Networks/CNN/GAN/cGAN.md
STEM/AI/Neural Networks/CNN/Inception Layer.md
STEM/AI/Neural Networks/CNN/Max Pooling.md
STEM/AI/Neural Networks/CNN/Normalisation.md
STEM/AI/Neural Networks/CV/Data Manipulations.md
STEM/AI/Neural Networks/CV/Datasets.md
STEM/AI/Neural Networks/CV/Filters.md
STEM/AI/Neural Networks/CV/Layer Structure.md
STEM/AI/Neural Networks/Weight Init.md
STEM/img/alexnet.png
STEM/img/cgan-example.png
STEM/img/cgan.png
STEM/img/cnn-cv-layer-arch.png
STEM/img/cnn-descriptor.png
STEM/img/cnn-normalisation.png
STEM/img/code-vector-math-for-control-results.png
STEM/img/cvmfc.png
STEM/img/cyclegan-results.png
STEM/img/cyclegan.png
STEM/img/data-aug.png
STEM/img/data-whitening.png
STEM/img/dc-gan.png
STEM/img/fine-tuning-freezing.png
STEM/img/gabor.png
STEM/img/gan-arch.png
STEM/img/gan-arch2.png
STEM/img/gan-results.png
STEM/img/gan-training-discriminator.png
STEM/img/gan-training-generator.png
STEM/img/googlenet-auxilliary-loss.png
STEM/img/googlenet-inception.png
STEM/img/googlenet.png
STEM/img/icv-pos-neg-examples.png
STEM/img/icv-results.png
STEM/img/inception-layer-arch.png
STEM/img/inception-layer-effect.png
STEM/img/lenet-1989.png
STEM/img/lenet-1998.png
STEM/img/max-pooling.png
STEM/img/stackgan-results.png
STEM/img/stackgan.png
STEM/img/under-over-fitting.png
STEM/img/vgg-arch.png
STEM/img/vgg-spec.png
STEM/img/word2vec.png
This commit is contained in:
andy 2023-05-26 18:29:17 +01:00
parent 5a592c8c7c
commit 8f0b604256
53 changed files with 385 additions and 0 deletions

View File

@ -52,5 +52,17 @@ $$\frac{dy}{dx}=
Rectilinear
- For deep networks
- $y=max(0,x)$
- CNNs
- Breaks associativity of successive convolutions
- Critical for learning complex functions
- Sometimes small scalar for negative
- Leaky ReLu
![[relu.png]]
# SoftMax
- Output is per-class vector of likelihoods
- Should be normalised into probability vector
## AlexNet
$$f(x_i)=\frac{\text{exp}(x_i)}{\sum_{j=1}^{1000}\text{exp}(x_j)}$$

View File

@ -0,0 +1,54 @@
## Before 2010s
- Data hungry
- Need lots of training data
- Processing power
- Niche
- No-one cared/knew about CNNs
## After
- ImageNet
- 16m images, 1000 classes
- GPUs
- General processing GPUs
- CUDA
- NIPS/ECCV 2012
- Double digit % gain on ImageNet accuracy
# Full Connected
Dense
- Move from convolutional operations towards vector output
- Stochastic drop-out
- Sub-sample channels and only connect some to dense layers
# As a Descriptor
- Most powerful as a deeply learned feature extractor
- Dense classifier at the end isn't fantastic
- Use SVM to classify prior to penultimate layer
![[cnn-descriptor.png]]
# Finetuning
- Observations
- Most CNNs have similar weights in conv1
- Most useful CNNs have several conv layers
- Many weights
- Lots of training data
- Training data is hard to get
- Labelling
- Reuse weights from other network
- Freeze weights in first 3-5 conv layers
- Learning rate = 0
- Randomly initialise remaining layers
- Continue with existing weights
![[fine-tuning-freezing.png]]
# Training
- Validation & training loss
- Early
- Under-fitting
- Training not representative
- Later
- Overfitting
- V.loss can help adjust learning rate
- Or indicate when to stop training
![[under-over-fitting.png]]

View File

@ -0,0 +1,25 @@
## Design Parameters
- Size of input image
- 256 x 256 x 1
- Towards top end of supportable
- Padding
- Thickness of border 0s
- Kernel size
- 7 x 7 x 1 x n
- N is for multiple filters per layer
- Main design decision
- 12 x 12/15 x 15 in early layers
- Lower in later filters
- Dataset-dependent
- Stride
- Interval to sample
- 1
- Every subsequent pixel
- Same size out as in
- 2
- Every other subsequent pixel
- Out image is half input size
- Size of computable output
- 252 x 252 x 1 x n
- Depends on padding and striding

View File

@ -0,0 +1,43 @@
# LeNet
- 1990's
![[lenet-1989.png]]
- 1989
![[lenet-1998.png]]
- 1998
# AlexNet
2012
- [[Activation Functions#ReLu|ReLu]]
- Normalisation
![[alexnet.png]]
# VGG
2015
- 16 layers over AlexNet's 8
- Looking at vanishing gradient problem
- Xavier
- Similar kernel size throughout
- Gradual filter increase
![[vgg-spec.png]]
![[vgg-arch.png]]
# GoogLeNet
2015
- [[Inception Layer]]s
- Multiple Loss Functions
![[googlenet.png]]
## [[Inception Layer]]
![[googlenet-inception.png]]
## Auxiliary Loss Functions
- Two other SoftMax blocks
- Help train really deep network
- Vanishing gradient problem
![[googlenet-auxilliary-loss.png]]

View File

@ -0,0 +1,22 @@
Cycle Consistent GAN
- G
- $x \rightarrow y$
- F
- $y \rightarrow x$
- Aims to bridge gap across domains
- Zebras-horses
- Audi-BMW
- Learn bidirectional mapping function
- Transitivity regularises training
- $x \rightarrow y'$
- $y' \rightarrow x''$
- $x == x''$
- Cycle consistency
- Requires two datasets
- One for each domain
- Not directly paired
- Unlike edge map $\rightarrow$ bag
![[cyclegan.png]]
![[cyclegan-results.png]]

View File

@ -0,0 +1,69 @@
Deep Convolutional GAN
![[dc-gan.png]]
- Generator
- FCN
- Decoder
- Generate image from code
- Low-dimensional
- ~100-D
- Reshape to tensor
- Upconv to image
- Train using Gaussian random noise for code
- Discriminator
- Contractive
- Cross-entropy loss
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
- Normalised output via sigmoid
## Loss
$$D(S,L)=-\sum_iL_ilog(S_i)$$
- $S$
- $(0.1, 0.9)^T$
- Score generated by discriminator
- $L$
- $(1, 0)^T$
- One-hot label vector
- Step 1
- Depends on choice of real/fake
- Step 2
- One-hot fake vector
- $\sum_i$
- Sum over all images in mini-batch
| Noise | Image |
| ----- | ----- |
| $z$ | $x$ |
- Generator wants
- $D(G(z))=1$
- Wants to fool discriminator
- Discriminator wants
- $D(G(z))=0$
- Wants to correctly catch generator
- Real data wants
- $D(x)=1$
$$J^{(D)}=-\frac 1 2 \mathbb E_{x\sim p_{data}}\log D(x)-\frac 1 2 \mathbb E_z\log (1-D(G(z)))$$
$$J^{(G)}=-J^{(D)}$$
- First term for real images
- Second term for fake images
# Mode Collapse
- Generator gives easy solution
- Learns one image for most noise that will fool discriminator
- Mitigate by minibatch discriminator
- Match G(z) distribution to x
# What is Learnt?
- Encoding texture/patch detail from training set
- Similar to FCN
- Reproducing texture at high level
- Cues triggered by code vector
- Input random noise
- Iteratively improves visual feasibility
- Different to FCN
- Discriminator is a task specific classifier
- Difficult to train over diverse footage
- Mixing concepts doesn't work
- Single category/class

View File

@ -0,0 +1,31 @@
# Fully Convolutional
- Remove max-pooling
- Use strided upconv
- Remove FC layers
- Hurts convergence in non-classification
- Normalisation tricks
- Batch normalisation
- Batches of 0 mean and variance 1
- Leaky ReLu
# Stages
## Generator, G
- Synthesise 'fake' images
- From noise
## Discriminator, D
- Discriminator is a classifier
- Is image fake or real
![[gan-arch.png]]
![[gan-arch2.png]]
![[gan-results.png]]
# Training
![[gan-training-discriminator.png]]
![[gan-training-generator.png]]
# Code Vector Math for Control
![[cvmfc.png]]
- Do AM to derive code for an image
![[code-vector-math-for-control-results.png]]

View File

@ -0,0 +1,6 @@
- Feed output from synthesis into up-res network
- Generate standard low-res image
- Feed into [[cGAN]]
![[stackgan.png]]
![[stackgan-results.png]]

View File

@ -0,0 +1,23 @@
Conditional GAN
- Hard to control with AM
- Unconditional GAN
- Condition synthesis on a class label
- Concatenate unconditional code with conditioning vector
- Label
- No longer unsupervised
- Everything labelled
- Fake images and dataset
- **Requires pairing**
![[cgan.png]]
![[cgan-example.png]]
# Image Conditioning Vector
![[icv-pos-neg-examples.png]]
![[icv-results.png]]
# Text Encoding
- word2vec
![[word2vec.png]]

View File

@ -0,0 +1,14 @@
- Similar to band-pass pyramid
- Changes fixed scale window sizes
- Couple of different scales
- Concatenate results
![[inception-layer-effect.png]]
![[inception-layer-arch.png]]
- 1 x 1
- Averages over channels
- Bottleneck layer
- Reduces computation
- x 10
- Shrinks number of filters

View File

@ -0,0 +1,26 @@
- Maximum within window and writes result to output
- Downsamples image
- More non-linearity
- Doesn't remove important information
- Max value is the good bit
- No parameters
![[max-pooling.png]]
## Design Parameters
- Size of input image
- 252 x 252 x 1 x n
- Padding
- Kernel size
- 3 x 3 x 1
- Doesn't need to be odd
- 2 x 2
- Stride
- Typically n
- For n x n kernel size
- Sometimes 4 x 4 in early layers
- 16 times less data
- Rapid downsample
- Size of computable output
- 250 x 250 x 1 x n
- Depends on padding and striding

View File

@ -0,0 +1,5 @@
- To keep sensible layer by layer
- Apply kernel to same location of all channels
- Pixels in window divided by sum of pixel within volume across channels
![[cnn-normalisation.png]]

View File

@ -0,0 +1,11 @@
# Augmentation
- Mimic larger datasets
- Help with over-fitting
![[data-aug.png]]
# Data Whitening
- Remove average image of dataset
- Or average RGB pixel from all
![[data-whitening.png]]

View File

@ -0,0 +1,23 @@
# MNIST
- 70,000 hand-drawn characters from US mail
- 28x28 images
- 10 classes (0 through 9)
- Achieved 99.83%
- Ciresan et al. 2011
# CIFAR-10
- 60,000 colour images
- 32x32 images
- 10 classes
- Airplane
- Automobile
- Bird
- Cat
- Deer
- Dog
- Frog
- Horse
- Ship
- Truck
- Achieved 90.7%
- Wan et al. 2013

View File

@ -0,0 +1,2 @@
# Gabor
![[gabor.png]]

View File

@ -0,0 +1 @@
![[cnn-cv-layer-arch.png]]

View File

@ -0,0 +1,18 @@
- Randomly
- Gaussian noise with mean = 0
- Small network
- Fixed sigma is fine
- 0.01
- E.g. 8 layers
- AlexNet
- Too large
- Wont converge
- Too small
- Gradient wont propagate back many layers
## Xavier System
$$\sigma=\frac 1 {n_{in}+n_{out}}$$
or
$$\sigma=\sqrt{2/n}$$
* Where $n=\text{filter size}\times n_{out}$
* And $n_{in}$ and $n_{out}$ refer to number of image channels in and out of the layer

BIN
img/alexnet.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

BIN
img/cgan-example.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

BIN
img/cgan.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

BIN
img/cnn-cv-layer-arch.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 251 KiB

BIN
img/cnn-descriptor.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

BIN
img/cnn-normalisation.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

BIN
img/cvmfc.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

BIN
img/cyclegan-results.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 322 KiB

BIN
img/cyclegan.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

BIN
img/data-aug.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 146 KiB

BIN
img/data-whitening.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 248 KiB

BIN
img/dc-gan.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

BIN
img/gabor.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

BIN
img/gan-arch.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
img/gan-arch2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

BIN
img/gan-results.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 639 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

BIN
img/googlenet-inception.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 150 KiB

BIN
img/googlenet.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 182 KiB

BIN
img/icv-results.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 390 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

BIN
img/lenet-1989.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

BIN
img/lenet-1998.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

BIN
img/max-pooling.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

BIN
img/stackgan-results.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 247 KiB

BIN
img/stackgan.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

BIN
img/under-over-fitting.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

BIN
img/vgg-arch.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

BIN
img/vgg-spec.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

BIN
img/word2vec.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 223 KiB