vault backup: 2023-05-31 22:21:56
Affected files: .obsidian/global-search.json .obsidian/workspace.json Health/Alexithymia.md Health/BWS.md STEM/AI/Neural Networks/Activation Functions.md STEM/AI/Neural Networks/Architectures.md STEM/AI/Neural Networks/CNN/CNN.md STEM/AI/Neural Networks/MLP/Back-Propagation.md STEM/AI/Neural Networks/Transformers/Attention.md STEM/CS/Calling Conventions.md STEM/CS/Languages/Assembly.md
This commit is contained in:
parent
bfdc107e5d
commit
4cc2e79866
@ -11,7 +11,7 @@
|
|||||||
- Bipolar
|
- Bipolar
|
||||||
- -1 <-> +1
|
- -1 <-> +1
|
||||||
|
|
||||||
![[threshold-activation.png]]
|
![threshold-activation](../../img/threshold-activation.png)
|
||||||
|
|
||||||
# Sigmoid
|
# Sigmoid
|
||||||
- Logistic function
|
- Logistic function
|
||||||
@ -26,7 +26,8 @@ $$\frac d {dx} \sigma(x)=
|
|||||||
\right]
|
\right]
|
||||||
=\sigma(x)\cdot(1-\sigma(x))$$
|
=\sigma(x)\cdot(1-\sigma(x))$$
|
||||||
|
|
||||||
![[sigmoid.png]]
|
![sigmoid](../../img/sigmoid.png)
|
||||||
|
|
||||||
### Derivative
|
### Derivative
|
||||||
|
|
||||||
$$y_j(n)=\varphi_j(v_j(n))=
|
$$y_j(n)=\varphi_j(v_j(n))=
|
||||||
@ -58,7 +59,7 @@ Rectilinear
|
|||||||
- Sometimes small scalar for negative
|
- Sometimes small scalar for negative
|
||||||
- Leaky ReLu
|
- Leaky ReLu
|
||||||
|
|
||||||
![[relu.png]]
|
![relu](../../img/relu.png)
|
||||||
|
|
||||||
# SoftMax
|
# SoftMax
|
||||||
- Output is per-class vector of likelihoods
|
- Output is per-class vector of likelihoods
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
- *Acyclic*
|
- *Acyclic*
|
||||||
- Count output layer, no computation at input
|
- Count output layer, no computation at input
|
||||||
|
|
||||||
![[feedforward.png]]
|
![feedforward](../../img/feedforward.png)
|
||||||
|
|
||||||
# Multilayer Feedforward
|
# Multilayer Feedforward
|
||||||
- Hidden layers
|
- Hidden layers
|
||||||
@ -12,12 +12,12 @@
|
|||||||
- Fully connected
|
- Fully connected
|
||||||
- Every neuron is connected to every neuron adjacent layers
|
- Every neuron is connected to every neuron adjacent layers
|
||||||
- Below is a 10-4-2 network
|
- Below is a 10-4-2 network
|
||||||
![[multilayerfeedforward.png]]
|
![multilayerfeedforward](../../img/multilayerfeedforward.png)
|
||||||
|
|
||||||
# Recurrent
|
# Recurrent
|
||||||
- At least one feedback loop
|
- At least one feedback loop
|
||||||
- Below has no self-feedback
|
- Below has no self-feedback
|
||||||
![[recurrent.png]]
|
![recurrent](../../img/recurrent.png)
|
||||||
![[recurrentwithhn.png]]
|
![recurrentwithhn](../../img/recurrentwithhn.png)
|
||||||
|
|
||||||
- Above has hidden neurons
|
- Above has hidden neurons
|
@ -5,13 +5,13 @@
|
|||||||
- Niche
|
- Niche
|
||||||
- No-one cared/knew about CNNs
|
- No-one cared/knew about CNNs
|
||||||
## After
|
## After
|
||||||
- [[Datasets#ImageNet|ImageNet]]
|
- [ImageNet](../CV/Datasets.md#ImageNet)
|
||||||
- 16m images, 1000 classes
|
- 16m images, 1000 classes
|
||||||
- GPUs
|
- GPUs
|
||||||
- General processing GPUs
|
- General processing GPUs
|
||||||
- CUDA
|
- CUDA
|
||||||
- NIPS/ECCV 2012
|
- NIPS/ECCV 2012
|
||||||
- Double digit % gain on [[Datasets#ImageNet|ImageNet]] accuracy
|
- Double digit % gain on [ImageNet](../CV/Datasets.md#ImageNet) accuracy
|
||||||
|
|
||||||
# Full Connected
|
# Full Connected
|
||||||
[[MLP|Dense]]
|
[[MLP|Dense]]
|
||||||
|
@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
|
|||||||
2. Error WRT output $y$
|
2. Error WRT output $y$
|
||||||
3. Output $y$ WRT Pre-activation function sum
|
3. Output $y$ WRT Pre-activation function sum
|
||||||
4. Pre-activation function sum WRT weight
|
4. Pre-activation function sum WRT weight
|
||||||
- Other [[Weight Init|weights]] constant, goes to zero
|
- Other [weights](../Weight%20Init.md) constant, goes to zero
|
||||||
- Leaves just $y_i$
|
- Leaves just $y_i$
|
||||||
- Collect 3 boxed terms as delta $j$
|
- Collect 3 boxed terms as delta $j$
|
||||||
- Local gradient
|
- Local gradient
|
||||||
|
@ -10,16 +10,16 @@
|
|||||||
- [LSTM](../RNN/LSTM.md) tends to poorly preserve far back [knowledge](../Neural%20Networks.md#Knowledge)
|
- [LSTM](../RNN/LSTM.md) tends to poorly preserve far back [knowledge](../Neural%20Networks.md#Knowledge)
|
||||||
- Attention layer access all previous states and weighs according to learned measure of relevance
|
- Attention layer access all previous states and weighs according to learned measure of relevance
|
||||||
- Allows referring arbitrarily far back to relevant tokens
|
- Allows referring arbitrarily far back to relevant tokens
|
||||||
- Can be addd to [[RNN]]s
|
- Can be addd to [RNNs](../RNN/RNN.md)
|
||||||
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
|
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [feedforward](../Architectures.md) network
|
||||||
- Attention useful in of itself, not just with [[RNN]]s
|
- Attention useful in of itself, not just with [RNNs](../RNN/RNN.md)
|
||||||
- [[Transformers]] use attention without recurrent connections
|
- [Transformers](Transformers.md) use attention without recurrent connections
|
||||||
- Process all tokens simultaneously
|
- Process all tokens simultaneously
|
||||||
- Calculate attention weights in successive layers
|
- Calculate attention weights in successive layers
|
||||||
|
|
||||||
# Scaled Dot-Product
|
# Scaled Dot-Product
|
||||||
- Calculate attention weights between all tokens at once
|
- Calculate attention weights between all tokens at once
|
||||||
- Learn 3 [[Weight Init|weight]] matrices
|
- Learn 3 [weight](../Weight%20Init.md) matrices
|
||||||
- Query
|
- Query
|
||||||
- $W_Q$
|
- $W_Q$
|
||||||
- Key
|
- Key
|
||||||
|
@ -5,15 +5,15 @@
|
|||||||
- Also known as: callee-saved registers or non-volatile registers
|
- Also known as: callee-saved registers or non-volatile registers
|
||||||
- How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
|
- How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
|
||||||
|
|
||||||
Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]
|
Subtle differences between [Compilers](Compilers.md), can be difficult to interface codes from different [compilers](Compilers.md)
|
||||||
|
|
||||||
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])
|
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([ABI](ABI.md))
|
||||||
|
|
||||||
# cdecl
|
# cdecl
|
||||||
C declaration
|
C declaration
|
||||||
|
|
||||||
- Originally from Microsoft's C [[compilers|compiler]]
|
- Originally from Microsoft's C [compiler](Compilers.md)
|
||||||
- Used by many C [[compilers]] for x86
|
- Used by many C [compilers](Compilers.md) for x86
|
||||||
- Subroutine arguments passed on the stack
|
- Subroutine arguments passed on the stack
|
||||||
- Function arguments pushed right-to-left
|
- Function arguments pushed right-to-left
|
||||||
- Last pushed first
|
- Last pushed first
|
||||||
|
@ -1,11 +1,11 @@
|
|||||||
[Uni of Virginia - x86 Assembly Guide](https://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
|
[Uni of Virginia - x86 Assembly Guide](https://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
|
||||||
|
|
||||||
## x86 32-bit
|
## x86 32-bit
|
||||||
![[x86registers.png]]
|
![x86registers](../../img/x86registers.png)
|
||||||
|
|
||||||
## Stack
|
## Stack
|
||||||
- push, pop, call, ret
|
- push, pop, call, ret
|
||||||
|
|
||||||
![[stack.png]]
|
![stack](../../img/stack.png)
|
||||||
- Growing upwards
|
- Growing upwards
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user