vault backup: 2023-05-26 18:29:17
Affected files: .obsidian/graph.json .obsidian/workspace-mobile.json .obsidian/workspace.json STEM/AI/Neural Networks/Activation Functions.md STEM/AI/Neural Networks/CNN/CNN.md STEM/AI/Neural Networks/CNN/Convolutional Layer.md STEM/AI/Neural Networks/CNN/Examples.md STEM/AI/Neural Networks/CNN/GAN/CycleGAN.md STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md STEM/AI/Neural Networks/CNN/GAN/GAN.md STEM/AI/Neural Networks/CNN/GAN/StackGAN.md STEM/AI/Neural Networks/CNN/GAN/cGAN.md STEM/AI/Neural Networks/CNN/Inception Layer.md STEM/AI/Neural Networks/CNN/Max Pooling.md STEM/AI/Neural Networks/CNN/Normalisation.md STEM/AI/Neural Networks/CV/Data Manipulations.md STEM/AI/Neural Networks/CV/Datasets.md STEM/AI/Neural Networks/CV/Filters.md STEM/AI/Neural Networks/CV/Layer Structure.md STEM/AI/Neural Networks/Weight Init.md STEM/img/alexnet.png STEM/img/cgan-example.png STEM/img/cgan.png STEM/img/cnn-cv-layer-arch.png STEM/img/cnn-descriptor.png STEM/img/cnn-normalisation.png STEM/img/code-vector-math-for-control-results.png STEM/img/cvmfc.png STEM/img/cyclegan-results.png STEM/img/cyclegan.png STEM/img/data-aug.png STEM/img/data-whitening.png STEM/img/dc-gan.png STEM/img/fine-tuning-freezing.png STEM/img/gabor.png STEM/img/gan-arch.png STEM/img/gan-arch2.png STEM/img/gan-results.png STEM/img/gan-training-discriminator.png STEM/img/gan-training-generator.png STEM/img/googlenet-auxilliary-loss.png STEM/img/googlenet-inception.png STEM/img/googlenet.png STEM/img/icv-pos-neg-examples.png STEM/img/icv-results.png STEM/img/inception-layer-arch.png STEM/img/inception-layer-effect.png STEM/img/lenet-1989.png STEM/img/lenet-1998.png STEM/img/max-pooling.png STEM/img/stackgan-results.png STEM/img/stackgan.png STEM/img/under-over-fitting.png STEM/img/vgg-arch.png STEM/img/vgg-spec.png STEM/img/word2vec.png
@ -52,5 +52,17 @@ $$\frac{dy}{dx}=
|
||||
Rectilinear
|
||||
- For deep networks
|
||||
- $y=max(0,x)$
|
||||
- CNNs
|
||||
- Breaks associativity of successive convolutions
|
||||
- Critical for learning complex functions
|
||||
- Sometimes small scalar for negative
|
||||
- Leaky ReLu
|
||||
|
||||
![[relu.png]]
|
||||
|
||||
# SoftMax
|
||||
- Output is per-class vector of likelihoods
|
||||
- Should be normalised into probability vector
|
||||
|
||||
## AlexNet
|
||||
$$f(x_i)=\frac{\text{exp}(x_i)}{\sum_{j=1}^{1000}\text{exp}(x_j)}$$
|
@ -0,0 +1,54 @@
|
||||
## Before 2010s
|
||||
- Data hungry
|
||||
- Need lots of training data
|
||||
- Processing power
|
||||
- Niche
|
||||
- No-one cared/knew about CNNs
|
||||
## After
|
||||
- ImageNet
|
||||
- 16m images, 1000 classes
|
||||
- GPUs
|
||||
- General processing GPUs
|
||||
- CUDA
|
||||
- NIPS/ECCV 2012
|
||||
- Double digit % gain on ImageNet accuracy
|
||||
|
||||
# Full Connected
|
||||
Dense
|
||||
- Move from convolutional operations towards vector output
|
||||
- Stochastic drop-out
|
||||
- Sub-sample channels and only connect some to dense layers
|
||||
|
||||
# As a Descriptor
|
||||
- Most powerful as a deeply learned feature extractor
|
||||
- Dense classifier at the end isn't fantastic
|
||||
- Use SVM to classify prior to penultimate layer
|
||||
|
||||
![[cnn-descriptor.png]]
|
||||
|
||||
# Finetuning
|
||||
- Observations
|
||||
- Most CNNs have similar weights in conv1
|
||||
- Most useful CNNs have several conv layers
|
||||
- Many weights
|
||||
- Lots of training data
|
||||
- Training data is hard to get
|
||||
- Labelling
|
||||
- Reuse weights from other network
|
||||
- Freeze weights in first 3-5 conv layers
|
||||
- Learning rate = 0
|
||||
- Randomly initialise remaining layers
|
||||
- Continue with existing weights
|
||||
|
||||
![[fine-tuning-freezing.png]]
|
||||
# Training
|
||||
- Validation & training loss
|
||||
- Early
|
||||
- Under-fitting
|
||||
- Training not representative
|
||||
- Later
|
||||
- Overfitting
|
||||
- V.loss can help adjust learning rate
|
||||
- Or indicate when to stop training
|
||||
|
||||
![[under-over-fitting.png]]
|
25
AI/Neural Networks/CNN/Convolutional Layer.md
Normal file
@ -0,0 +1,25 @@
|
||||
|
||||
## Design Parameters
|
||||
- Size of input image
|
||||
- 256 x 256 x 1
|
||||
- Towards top end of supportable
|
||||
- Padding
|
||||
- Thickness of border 0s
|
||||
- Kernel size
|
||||
- 7 x 7 x 1 x n
|
||||
- N is for multiple filters per layer
|
||||
- Main design decision
|
||||
- 12 x 12/15 x 15 in early layers
|
||||
- Lower in later filters
|
||||
- Dataset-dependent
|
||||
- Stride
|
||||
- Interval to sample
|
||||
- 1
|
||||
- Every subsequent pixel
|
||||
- Same size out as in
|
||||
- 2
|
||||
- Every other subsequent pixel
|
||||
- Out image is half input size
|
||||
- Size of computable output
|
||||
- 252 x 252 x 1 x n
|
||||
- Depends on padding and striding
|
43
AI/Neural Networks/CNN/Examples.md
Normal file
@ -0,0 +1,43 @@
|
||||
# LeNet
|
||||
- 1990's
|
||||
![[lenet-1989.png]]
|
||||
- 1989
|
||||
![[lenet-1998.png]]
|
||||
- 1998
|
||||
|
||||
# AlexNet
|
||||
2012
|
||||
|
||||
- [[Activation Functions#ReLu|ReLu]]
|
||||
- Normalisation
|
||||
|
||||
![[alexnet.png]]
|
||||
|
||||
# VGG
|
||||
2015
|
||||
|
||||
- 16 layers over AlexNet's 8
|
||||
- Looking at vanishing gradient problem
|
||||
- Xavier
|
||||
- Similar kernel size throughout
|
||||
- Gradual filter increase
|
||||
|
||||
![[vgg-spec.png]]
|
||||
![[vgg-arch.png]]
|
||||
|
||||
# GoogLeNet
|
||||
2015
|
||||
|
||||
- [[Inception Layer]]s
|
||||
- Multiple Loss Functions
|
||||
|
||||
![[googlenet.png]]
|
||||
|
||||
## [[Inception Layer]]
|
||||
![[googlenet-inception.png]]
|
||||
## Auxiliary Loss Functions
|
||||
- Two other SoftMax blocks
|
||||
- Help train really deep network
|
||||
- Vanishing gradient problem
|
||||
|
||||
![[googlenet-auxilliary-loss.png]]
|
22
AI/Neural Networks/CNN/GAN/CycleGAN.md
Normal file
@ -0,0 +1,22 @@
|
||||
Cycle Consistent GAN
|
||||
|
||||
- G
|
||||
- $x \rightarrow y$
|
||||
- F
|
||||
- $y \rightarrow x$
|
||||
- Aims to bridge gap across domains
|
||||
- Zebras-horses
|
||||
- Audi-BMW
|
||||
- Learn bidirectional mapping function
|
||||
- Transitivity regularises training
|
||||
- $x \rightarrow y'$
|
||||
- $y' \rightarrow x''$
|
||||
- $x == x''$
|
||||
- Cycle consistency
|
||||
- Requires two datasets
|
||||
- One for each domain
|
||||
- Not directly paired
|
||||
- Unlike edge map $\rightarrow$ bag
|
||||
|
||||
![[cyclegan.png]]
|
||||
![[cyclegan-results.png]]
|
69
AI/Neural Networks/CNN/GAN/DC-GAN.md
Normal file
@ -0,0 +1,69 @@
|
||||
Deep Convolutional GAN
|
||||
![[dc-gan.png]]
|
||||
|
||||
- Generator
|
||||
- FCN
|
||||
- Decoder
|
||||
- Generate image from code
|
||||
- Low-dimensional
|
||||
- ~100-D
|
||||
- Reshape to tensor
|
||||
- Upconv to image
|
||||
- Train using Gaussian random noise for code
|
||||
- Discriminator
|
||||
- Contractive
|
||||
- Cross-entropy loss
|
||||
- Conv and leaky [[Activation Functions#ReLu|ReLu]] layers only
|
||||
- Normalised output via sigmoid
|
||||
|
||||
## Loss
|
||||
$$D(S,L)=-\sum_iL_ilog(S_i)$$
|
||||
- $S$
|
||||
- $(0.1, 0.9)^T$
|
||||
- Score generated by discriminator
|
||||
- $L$
|
||||
- $(1, 0)^T$
|
||||
- One-hot label vector
|
||||
- Step 1
|
||||
- Depends on choice of real/fake
|
||||
- Step 2
|
||||
- One-hot fake vector
|
||||
- $\sum_i$
|
||||
- Sum over all images in mini-batch
|
||||
|
||||
| Noise | Image |
|
||||
| ----- | ----- |
|
||||
| $z$ | $x$ |
|
||||
|
||||
- Generator wants
|
||||
- $D(G(z))=1$
|
||||
- Wants to fool discriminator
|
||||
- Discriminator wants
|
||||
- $D(G(z))=0$
|
||||
- Wants to correctly catch generator
|
||||
- Real data wants
|
||||
- $D(x)=1$
|
||||
|
||||
$$J^{(D)}=-\frac 1 2 \mathbb E_{x\sim p_{data}}\log D(x)-\frac 1 2 \mathbb E_z\log (1-D(G(z)))$$
|
||||
$$J^{(G)}=-J^{(D)}$$
|
||||
- First term for real images
|
||||
- Second term for fake images
|
||||
|
||||
# Mode Collapse
|
||||
- Generator gives easy solution
|
||||
- Learns one image for most noise that will fool discriminator
|
||||
- Mitigate by minibatch discriminator
|
||||
- Match G(z) distribution to x
|
||||
|
||||
# What is Learnt?
|
||||
- Encoding texture/patch detail from training set
|
||||
- Similar to FCN
|
||||
- Reproducing texture at high level
|
||||
- Cues triggered by code vector
|
||||
- Input random noise
|
||||
- Iteratively improves visual feasibility
|
||||
- Different to FCN
|
||||
- Discriminator is a task specific classifier
|
||||
- Difficult to train over diverse footage
|
||||
- Mixing concepts doesn't work
|
||||
- Single category/class
|
31
AI/Neural Networks/CNN/GAN/GAN.md
Normal file
@ -0,0 +1,31 @@
|
||||
# Fully Convolutional
|
||||
- Remove max-pooling
|
||||
- Use strided upconv
|
||||
- Remove FC layers
|
||||
- Hurts convergence in non-classification
|
||||
- Normalisation tricks
|
||||
- Batch normalisation
|
||||
- Batches of 0 mean and variance 1
|
||||
- Leaky ReLu
|
||||
|
||||
# Stages
|
||||
## Generator, G
|
||||
- Synthesise 'fake' images
|
||||
- From noise
|
||||
## Discriminator, D
|
||||
- Discriminator is a classifier
|
||||
- Is image fake or real
|
||||
|
||||
![[gan-arch.png]]
|
||||
![[gan-arch2.png]]
|
||||
|
||||
![[gan-results.png]]
|
||||
|
||||
# Training
|
||||
![[gan-training-discriminator.png]]
|
||||
![[gan-training-generator.png]]
|
||||
|
||||
# Code Vector Math for Control
|
||||
![[cvmfc.png]]
|
||||
- Do AM to derive code for an image
|
||||
![[code-vector-math-for-control-results.png]]
|
6
AI/Neural Networks/CNN/GAN/StackGAN.md
Normal file
@ -0,0 +1,6 @@
|
||||
- Feed output from synthesis into up-res network
|
||||
- Generate standard low-res image
|
||||
- Feed into [[cGAN]]
|
||||
|
||||
![[stackgan.png]]
|
||||
![[stackgan-results.png]]
|
23
AI/Neural Networks/CNN/GAN/cGAN.md
Normal file
@ -0,0 +1,23 @@
|
||||
Conditional GAN
|
||||
|
||||
- Hard to control with AM
|
||||
- Unconditional GAN
|
||||
- Condition synthesis on a class label
|
||||
- Concatenate unconditional code with conditioning vector
|
||||
- Label
|
||||
- No longer unsupervised
|
||||
- Everything labelled
|
||||
- Fake images and dataset
|
||||
- **Requires pairing**
|
||||
|
||||
![[cgan.png]]
|
||||
![[cgan-example.png]]
|
||||
|
||||
# Image Conditioning Vector
|
||||
![[icv-pos-neg-examples.png]]
|
||||
![[icv-results.png]]
|
||||
|
||||
# Text Encoding
|
||||
- word2vec
|
||||
|
||||
![[word2vec.png]]
|
14
AI/Neural Networks/CNN/Inception Layer.md
Normal file
@ -0,0 +1,14 @@
|
||||
- Similar to band-pass pyramid
|
||||
- Changes fixed scale window sizes
|
||||
- Couple of different scales
|
||||
- Concatenate results
|
||||
|
||||
![[inception-layer-effect.png]]
|
||||
![[inception-layer-arch.png]]
|
||||
|
||||
- 1 x 1
|
||||
- Averages over channels
|
||||
- Bottleneck layer
|
||||
- Reduces computation
|
||||
- x 10
|
||||
- Shrinks number of filters
|
26
AI/Neural Networks/CNN/Max Pooling.md
Normal file
@ -0,0 +1,26 @@
|
||||
- Maximum within window and writes result to output
|
||||
- Downsamples image
|
||||
- More non-linearity
|
||||
- Doesn't remove important information
|
||||
- Max value is the good bit
|
||||
- No parameters
|
||||
|
||||
![[max-pooling.png]]
|
||||
|
||||
## Design Parameters
|
||||
- Size of input image
|
||||
- 252 x 252 x 1 x n
|
||||
- Padding
|
||||
- Kernel size
|
||||
- 3 x 3 x 1
|
||||
- Doesn't need to be odd
|
||||
- 2 x 2
|
||||
- Stride
|
||||
- Typically n
|
||||
- For n x n kernel size
|
||||
- Sometimes 4 x 4 in early layers
|
||||
- 16 times less data
|
||||
- Rapid downsample
|
||||
- Size of computable output
|
||||
- 250 x 250 x 1 x n
|
||||
- Depends on padding and striding
|
5
AI/Neural Networks/CNN/Normalisation.md
Normal file
@ -0,0 +1,5 @@
|
||||
- To keep sensible layer by layer
|
||||
- Apply kernel to same location of all channels
|
||||
- Pixels in window divided by sum of pixel within volume across channels
|
||||
|
||||
![[cnn-normalisation.png]]
|
11
AI/Neural Networks/CV/Data Manipulations.md
Normal file
@ -0,0 +1,11 @@
|
||||
# Augmentation
|
||||
- Mimic larger datasets
|
||||
- Help with over-fitting
|
||||
|
||||
![[data-aug.png]]
|
||||
|
||||
# Data Whitening
|
||||
- Remove average image of dataset
|
||||
- Or average RGB pixel from all
|
||||
|
||||
![[data-whitening.png]]
|
23
AI/Neural Networks/CV/Datasets.md
Normal file
@ -0,0 +1,23 @@
|
||||
# MNIST
|
||||
- 70,000 hand-drawn characters from US mail
|
||||
- 28x28 images
|
||||
- 10 classes (0 through 9)
|
||||
- Achieved 99.83%
|
||||
- Ciresan et al. 2011
|
||||
|
||||
# CIFAR-10
|
||||
- 60,000 colour images
|
||||
- 32x32 images
|
||||
- 10 classes
|
||||
- Airplane
|
||||
- Automobile
|
||||
- Bird
|
||||
- Cat
|
||||
- Deer
|
||||
- Dog
|
||||
- Frog
|
||||
- Horse
|
||||
- Ship
|
||||
- Truck
|
||||
- Achieved 90.7%
|
||||
- Wan et al. 2013
|
2
AI/Neural Networks/CV/Filters.md
Normal file
@ -0,0 +1,2 @@
|
||||
# Gabor
|
||||
![[gabor.png]]
|
1
AI/Neural Networks/CV/Layer Structure.md
Normal file
@ -0,0 +1 @@
|
||||
![[cnn-cv-layer-arch.png]]
|
18
AI/Neural Networks/Weight Init.md
Normal file
@ -0,0 +1,18 @@
|
||||
- Randomly
|
||||
- Gaussian noise with mean = 0
|
||||
- Small network
|
||||
- Fixed sigma is fine
|
||||
- 0.01
|
||||
- E.g. 8 layers
|
||||
- AlexNet
|
||||
- Too large
|
||||
- Wont converge
|
||||
- Too small
|
||||
- Gradient wont propagate back many layers
|
||||
|
||||
## Xavier System
|
||||
$$\sigma=\frac 1 {n_{in}+n_{out}}$$
|
||||
or
|
||||
$$\sigma=\sqrt{2/n}$$
|
||||
* Where $n=\text{filter size}\times n_{out}$
|
||||
* And $n_{in}$ and $n_{out}$ refer to number of image channels in and out of the layer
|
BIN
img/alexnet.png
Normal file
After Width: | Height: | Size: 48 KiB |
BIN
img/cgan-example.png
Normal file
After Width: | Height: | Size: 76 KiB |
BIN
img/cgan.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
img/cnn-cv-layer-arch.png
Normal file
After Width: | Height: | Size: 251 KiB |
BIN
img/cnn-descriptor.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
img/cnn-normalisation.png
Normal file
After Width: | Height: | Size: 4.7 KiB |
BIN
img/code-vector-math-for-control-results.png
Normal file
After Width: | Height: | Size: 181 KiB |
BIN
img/cvmfc.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
img/cyclegan-results.png
Normal file
After Width: | Height: | Size: 322 KiB |
BIN
img/cyclegan.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
img/data-aug.png
Normal file
After Width: | Height: | Size: 146 KiB |
BIN
img/data-whitening.png
Normal file
After Width: | Height: | Size: 248 KiB |
BIN
img/dc-gan.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
img/fine-tuning-freezing.png
Normal file
After Width: | Height: | Size: 68 KiB |
BIN
img/gabor.png
Normal file
After Width: | Height: | Size: 65 KiB |
BIN
img/gan-arch.png
Normal file
After Width: | Height: | Size: 36 KiB |
BIN
img/gan-arch2.png
Normal file
After Width: | Height: | Size: 61 KiB |
BIN
img/gan-results.png
Normal file
After Width: | Height: | Size: 639 KiB |
BIN
img/gan-training-discriminator.png
Normal file
After Width: | Height: | Size: 187 KiB |
BIN
img/gan-training-generator.png
Normal file
After Width: | Height: | Size: 129 KiB |
BIN
img/googlenet-auxilliary-loss.png
Normal file
After Width: | Height: | Size: 100 KiB |
BIN
img/googlenet-inception.png
Normal file
After Width: | Height: | Size: 150 KiB |
BIN
img/googlenet.png
Normal file
After Width: | Height: | Size: 117 KiB |
BIN
img/icv-pos-neg-examples.png
Normal file
After Width: | Height: | Size: 182 KiB |
BIN
img/icv-results.png
Normal file
After Width: | Height: | Size: 390 KiB |
BIN
img/inception-layer-arch.png
Normal file
After Width: | Height: | Size: 87 KiB |
BIN
img/inception-layer-effect.png
Normal file
After Width: | Height: | Size: 135 KiB |
BIN
img/lenet-1989.png
Normal file
After Width: | Height: | Size: 94 KiB |
BIN
img/lenet-1998.png
Normal file
After Width: | Height: | Size: 102 KiB |
BIN
img/max-pooling.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
img/stackgan-results.png
Normal file
After Width: | Height: | Size: 247 KiB |
BIN
img/stackgan.png
Normal file
After Width: | Height: | Size: 127 KiB |
BIN
img/under-over-fitting.png
Normal file
After Width: | Height: | Size: 178 KiB |
BIN
img/vgg-arch.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
img/vgg-spec.png
Normal file
After Width: | Height: | Size: 87 KiB |
BIN
img/word2vec.png
Normal file
After Width: | Height: | Size: 223 KiB |