63 lines
1.7 KiB
Markdown
63 lines
1.7 KiB
Markdown
|
# Supervised
|
||
|
- Dataset with inputs manually annotated for desired output
|
||
|
- Desired output = supervisory signal
|
||
|
- Manually annotated = ground truth
|
||
|
- Annotated correct categories
|
||
|
|
||
|
## Split data
|
||
|
- Training set
|
||
|
- Test set
|
||
|
***Don't test on training data***
|
||
|
|
||
|
## Top-K Accuracy
|
||
|
- Whether correct answer appears in the top-k results
|
||
|
|
||
|
## Confusion Matrix
|
||
|
Samples described by ***feature vector***
|
||
|
Dataset forms a matrix
|
||
|
![](../img/confusion-matrix.png)
|
||
|
|
||
|
# Un-Supervised
|
||
|
- No example outputs given, learns how to categorise
|
||
|
- No teacher or critic
|
||
|
|
||
|
## Harder
|
||
|
- Must identify relevant distinguishing features
|
||
|
- Must decide on number of categories
|
||
|
|
||
|
# Reinforcement Learning
|
||
|
- No teacher - critic instead
|
||
|
- Continued interaction with the environment
|
||
|
- Minimise a scalar performance index
|
||
|
|
||
|
![](../img/reinforcement-learning.png)
|
||
|
|
||
|
- Critic
|
||
|
- Converts primary reinforcement to heuristic reinforcement
|
||
|
- Both scalar inputs
|
||
|
- Delayed reinforcement
|
||
|
- System observes temporal sequence of stimuli
|
||
|
- Results in generation of heuristic reinforcement signal
|
||
|
- Minimise cost-to-go function
|
||
|
- Expectation of cumulative cost of actions taken over sequence of steps
|
||
|
- Instead of just immediate cost
|
||
|
- Earlier actions may have been good
|
||
|
- Identify and feedback to environment
|
||
|
- Closely related to dynamic programming
|
||
|
|
||
|
## Difficulties
|
||
|
- No teacher to provide desired response
|
||
|
- Must solve temporal credit assignment problem
|
||
|
- Need to know which actions were the good ones
|
||
|
|
||
|
# Fitting
|
||
|
- Over-fitting
|
||
|
- Classifier too specific to training set
|
||
|
- Can't adequately generalise
|
||
|
- Under-fitting
|
||
|
- Too general, not inferred enough detail
|
||
|
- Learns non-discriminative or non-desired pattern
|
||
|
|
||
|
# ROC
|
||
|
Receiver Operator Characteristic Curve
|
||
|
![](../img/receiver-operator-curve.png)
|