vault backup: 2023-05-31 22:21:56
Affected files: .obsidian/global-search.json .obsidian/workspace.json Health/Alexithymia.md Health/BWS.md STEM/AI/Neural Networks/Activation Functions.md STEM/AI/Neural Networks/Architectures.md STEM/AI/Neural Networks/CNN/CNN.md STEM/AI/Neural Networks/MLP/Back-Propagation.md STEM/AI/Neural Networks/Transformers/Attention.md STEM/CS/Calling Conventions.md STEM/CS/Languages/Assembly.md
This commit is contained in:
parent
bfdc107e5d
commit
4cc2e79866
@ -11,7 +11,7 @@
|
||||
- Bipolar
|
||||
- -1 <-> +1
|
||||
|
||||
![[threshold-activation.png]]
|
||||
![threshold-activation](../../img/threshold-activation.png)
|
||||
|
||||
# Sigmoid
|
||||
- Logistic function
|
||||
@ -26,7 +26,8 @@ $$\frac d {dx} \sigma(x)=
|
||||
\right]
|
||||
=\sigma(x)\cdot(1-\sigma(x))$$
|
||||
|
||||
![[sigmoid.png]]
|
||||
![sigmoid](../../img/sigmoid.png)
|
||||
|
||||
### Derivative
|
||||
|
||||
$$y_j(n)=\varphi_j(v_j(n))=
|
||||
@ -58,7 +59,7 @@ Rectilinear
|
||||
- Sometimes small scalar for negative
|
||||
- Leaky ReLu
|
||||
|
||||
![[relu.png]]
|
||||
![relu](../../img/relu.png)
|
||||
|
||||
# SoftMax
|
||||
- Output is per-class vector of likelihoods
|
||||
|
@ -2,7 +2,7 @@
|
||||
- *Acyclic*
|
||||
- Count output layer, no computation at input
|
||||
|
||||
![[feedforward.png]]
|
||||
![feedforward](../../img/feedforward.png)
|
||||
|
||||
# Multilayer Feedforward
|
||||
- Hidden layers
|
||||
@ -12,12 +12,12 @@
|
||||
- Fully connected
|
||||
- Every neuron is connected to every neuron adjacent layers
|
||||
- Below is a 10-4-2 network
|
||||
![[multilayerfeedforward.png]]
|
||||
![multilayerfeedforward](../../img/multilayerfeedforward.png)
|
||||
|
||||
# Recurrent
|
||||
- At least one feedback loop
|
||||
- Below has no self-feedback
|
||||
![[recurrent.png]]
|
||||
![[recurrentwithhn.png]]
|
||||
![recurrent](../../img/recurrent.png)
|
||||
![recurrentwithhn](../../img/recurrentwithhn.png)
|
||||
|
||||
- Above has hidden neurons
|
@ -5,13 +5,13 @@
|
||||
- Niche
|
||||
- No-one cared/knew about CNNs
|
||||
## After
|
||||
- [[Datasets#ImageNet|ImageNet]]
|
||||
- [ImageNet](../CV/Datasets.md#ImageNet)
|
||||
- 16m images, 1000 classes
|
||||
- GPUs
|
||||
- General processing GPUs
|
||||
- CUDA
|
||||
- NIPS/ECCV 2012
|
||||
- Double digit % gain on [[Datasets#ImageNet|ImageNet]] accuracy
|
||||
- Double digit % gain on [ImageNet](../CV/Datasets.md#ImageNet) accuracy
|
||||
|
||||
# Full Connected
|
||||
[[MLP|Dense]]
|
||||
|
@ -79,7 +79,7 @@ $$\Delta w_{ji}(n)=\eta\cdot\delta_j(n)\cdot y_i(n)$$
|
||||
2. Error WRT output $y$
|
||||
3. Output $y$ WRT Pre-activation function sum
|
||||
4. Pre-activation function sum WRT weight
|
||||
- Other [[Weight Init|weights]] constant, goes to zero
|
||||
- Other [weights](../Weight%20Init.md) constant, goes to zero
|
||||
- Leaves just $y_i$
|
||||
- Collect 3 boxed terms as delta $j$
|
||||
- Local gradient
|
||||
|
@ -10,16 +10,16 @@
|
||||
- [LSTM](../RNN/LSTM.md) tends to poorly preserve far back [knowledge](../Neural%20Networks.md#Knowledge)
|
||||
- Attention layer access all previous states and weighs according to learned measure of relevance
|
||||
- Allows referring arbitrarily far back to relevant tokens
|
||||
- Can be addd to [[RNN]]s
|
||||
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [[Architectures|feedforward]] network
|
||||
- Attention useful in of itself, not just with [[RNN]]s
|
||||
- [[Transformers]] use attention without recurrent connections
|
||||
- Can be addd to [RNNs](../RNN/RNN.md)
|
||||
- In 2016, a new type of highly parallelisable _decomposable attention_ was successfully combined with a [feedforward](../Architectures.md) network
|
||||
- Attention useful in of itself, not just with [RNNs](../RNN/RNN.md)
|
||||
- [Transformers](Transformers.md) use attention without recurrent connections
|
||||
- Process all tokens simultaneously
|
||||
- Calculate attention weights in successive layers
|
||||
|
||||
# Scaled Dot-Product
|
||||
- Calculate attention weights between all tokens at once
|
||||
- Learn 3 [[Weight Init|weight]] matrices
|
||||
- Learn 3 [weight](../Weight%20Init.md) matrices
|
||||
- Query
|
||||
- $W_Q$
|
||||
- Key
|
||||
|
@ -5,15 +5,15 @@
|
||||
- Also known as: callee-saved registers or non-volatile registers
|
||||
- How the task of preparing the stack for, and restoring after, a function call is divided between the caller and the callee
|
||||
|
||||
Subtle differences between [[compilers]], can be difficult to interface codes from different [[compilers]]
|
||||
Subtle differences between [Compilers](Compilers.md), can be difficult to interface codes from different [compilers](Compilers.md)
|
||||
|
||||
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([[ABI]])
|
||||
Calling conventions, type representations, and name mangling are all part of what is known as an [application binary interface](https://en.wikipedia.org/wiki/Application_binary_interface) ([ABI](ABI.md))
|
||||
|
||||
# cdecl
|
||||
C declaration
|
||||
|
||||
- Originally from Microsoft's C [[compilers|compiler]]
|
||||
- Used by many C [[compilers]] for x86
|
||||
- Originally from Microsoft's C [compiler](Compilers.md)
|
||||
- Used by many C [compilers](Compilers.md) for x86
|
||||
- Subroutine arguments passed on the stack
|
||||
- Function arguments pushed right-to-left
|
||||
- Last pushed first
|
||||
|
@ -1,11 +1,11 @@
|
||||
[Uni of Virginia - x86 Assembly Guide](https://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
|
||||
|
||||
## x86 32-bit
|
||||
![[x86registers.png]]
|
||||
![x86registers](../../img/x86registers.png)
|
||||
|
||||
## Stack
|
||||
- push, pop, call, ret
|
||||
|
||||
![[stack.png]]
|
||||
![stack](../../img/stack.png)
|
||||
- Growing upwards
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user