\begin_layout Title
\size giant
Training Neural Networks with Backpropagation
\begin_layout Author
Andy Pack
2021-03-28 18:37:32 +01:00
\begin_layout Section*
Executive Summary
\begin_layout Standard
Summary here
\begin_layout Section
\begin_layout Standard
Artificial neural networks have been the object of research and investigation since the 1940s with
since the 1940s with
\noun on
\noun default
\noun on
\noun default
' model of the artificial neuron
\begin_inset CommandInset citation
LatexCommand cite
key "McCulloch1943"
literal "false"
\emph on
Threshold Logic Unit
\emph default
Throughout the century, the development of the single and multi-layer perceptro
ns (SLP/MLP) alongside the backpropagation algorithm
\begin_inset CommandInset citation
LatexCommand cite
key "Rumelhart1986"
literal "false"
advanced the study of artificial intelligence.
Throughout the 2010s, convolutional neural networks have proved critical
in the field of computer vision and image recognition
\begin_inset CommandInset citation
LatexCommand cite
key "alexnet"
literal "false"
This work investigates the ability of a shallow multi-layer perceptron to
classify breast tumours as either benign or malignant.
The architecture and parameters were varied before exploring how in order
to evaluate how this affects performance.
\begin_layout Standard
Investigations were carried out in
\noun on
\noun default
using the
\noun on
\noun default
package to construct, train and evaluate neural networks.
The networks were trained using a supervised learning curriculum of labelled
data taken from a standard
\noun on
\noun default
\begin_inset CommandInset citation
LatexCommand cite
key "matlab-dataset"
literal "false"
from the
\noun on
Deep Learning Toolbox
\noun default
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:exp1"
plural "false"
caps "false"
noprefix "false"
investigates the effect of varying the number of hidden nodes on test accuracy
along with the number of epochs that the MLPs are trained for.
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:exp2"
plural "false"
caps "false"
noprefix "false"
builds on the previous experiment by using reasonable parameter values
to investigate performance when using an ensemble of models to classify
in conjunction.
The effect of varying the number of nodes and epochs throughout the ensemble
was considered in order to determine whether combining multiple models
2021-04-30 20:51:04 +01:00
could produce a better accuracy than any individual model.
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:exp3"
plural "false"
caps "false"
noprefix "false"
investigates the effect of altering how the networks learn by changing
the optimisation algorithm.
Two additional algorithms to the previously used are considered and compared
using the same test apparatus of section
\begin_inset CommandInset ref
LatexCommand ref
reference "sec:exp2"
plural "false"
caps "false"
noprefix "false"
2021-03-28 18:37:32 +01:00
\begin_layout Section
2021-04-30 20:51:04 +01:00
Hidden Nodes & Epochs
\begin_inset CommandInset label
LatexCommand label
name "sec:exp1"
\begin_layout Standard
2021-04-30 20:51:04 +01:00
This section investigates the effect of varying the number of nodes in the
single hidden layer of a shallow multi-layer perceptron.
This is compared to the effect of training the model with different numbers
of epochs.
Throughout the experiment, stochastic gradient descent with momentum is
used as the optimiser, variations in both momentum and learning rate are
\begin_layout Subsection
2021-04-30 20:51:04 +01:00
\begin_layout Standard
\begin_inset Caption Standard
\begin_layout Plain Layout
Varied hidden node performance results over varied training lengths for
\begin_inset Formula $\eta=0.01$
\begin_inset Formula $p=0$
\begin_inset CommandInset label
LatexCommand label
name "fig:exp1-test1"
LatexCommand ref
reference "fig:exp1-test1"
plural "false"
caps "false"
noprefix "false"
visualises the performance of hidden nodes up to 256 over training periods
up to 200 epochs in length.
In general, the error rate can be seen to decrease when the models are
trained for longer.
Increasing the number of nodes decreases the error rate and increases the
gradient with which it falls up to a limit.
64, 128 and 256 hidden nodes lie close together as the increases in performance
Between 0 and 25 epochs, the error rate throughout for any number of nodes
can descend little below 0.35.
The number of epochs to overcome this plateau is different for each number
of nodes.
\begin_layout Standard
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:exp1-test1"
plural "false"
caps "false"
noprefix "false"
can be seen in figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:exp1-test1-std"
plural "false"
caps "false"
noprefix "false"
As the network starts training, the standard deviation decreases to a minimum
\begin_inset Formula $10-20$
epochs before increasing to a peak at 64.
As the number of hidden nodes increases, the standard deviation decreases.
The initial drop is sharper and the 64 epoch peak increases higher.
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename /mnt/files/dev/py/shallow-training/graphs/exp1-test1-test-train-error-rate-std.png
lyxscale 50
width 60col%
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
Standard deviation of results from figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:exp1-test1"
plural "false"
caps "false"
noprefix "false"
\begin_inset Formula $\eta=0.01$
\begin_inset Formula $p=0$
\begin_inset CommandInset label
LatexCommand label
name "fig:exp1-test1-std"
\begin_layout Plain Layout
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename /mnt/files/dev/py/shallow-training/graphs/exp1-test2-2-error-rate-curves.png
lyxscale 50
width 50col%
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
Varied hidden node performance results over varied training lengths for
\begin_inset Formula $\eta=0.1$
\begin_inset Formula $p=0$
\begin_inset CommandInset label
LatexCommand label
name "fig:exp1-test2-2"
\begin_layout Section
2021-04-30 20:51:04 +01:00
Ensemble Classification
\begin_inset CommandInset label
LatexCommand label
name "sec:exp2"
2021-04-30 20:51:04 +01:00
A horizontal ensemble of
\begin_inset Formula $m$
models was constructed with majority vote in order to investigate whether
this could improve performance over that of any single model.
In order to introduce variation between models of the ensemble, a range
for hidden nodes and epochs could be defined.
When selecting parameters throughout the ensemble, the models are equally
distributed throughout the ranges
\emph on
\emph default
\begin_inset Formula $a$
, is defined as the proportion of models under the meta-classifier that
correctly predict a sample's class when the ensemble correctly classifies.
It could also be considered the confidence of the meta-classifier, for
one horizontal model
\begin_inset Formula $a_{m=1}=1$
As error rates are presented, this is inverted by
\begin_inset Formula $1-a$
\emph on
\emph default
\begin_inset Formula $d$
, the proportion of incorrect models when correctly group classifying.
For comparison, the average individual accuracy for both test and training data are presented.
data are presented.
\begin_inset Caption Standard
\begin_layout Plain Layout
Ensemble classifier performance results for
\begin_inset Formula $\eta=0.03$
\begin_inset Formula $p=0.01$
, nodes = 1 - 400, epochs = 5 - 100
\begin_inset CommandInset label
LatexCommand label
name "fig:exp2-test8"
An experiment with a fixed epoch value throughout the ensemble is presented in figure
in figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:exp2-test10"
plural "false"
caps "false"
noprefix "false"
Nodes between 1 and 400 were selected for the classifiers with a learning
\begin_inset Formula $\eta=0.15$
and momentum,
\begin_inset Formula $p=0.01$
The ensemble accuracy can be seen to be fairly constant throughout the
number of horizontal models with 3 models being the least accurate with
a higher standard deviation.
3 horizontal models also shows a significant spike in disagreement and
individual error rates which gradually decreases as the number of models
\begin_inset Float figure
wide false
sideways false
status open
\begin_layout Plain Layout
\align center
\begin_inset Graphics
filename ../graphs/exp2-test10-error-rate-curves.png
lyxscale 50
width 50col%
\begin_layout Plain Layout
\begin_inset Caption Standard
\begin_layout Plain Layout
Ensemble classifier performance results for
\begin_inset Formula $\eta=0.15$
\begin_inset Formula $p=0.01$
, nodes =
\begin_inset Formula $1-400$
, epochs = 20
\begin_inset CommandInset label
LatexCommand label
name "fig:exp2-test10"
From the data of figure
\begin_inset CommandInset ref
LatexCommand ref
reference "fig:exp2-test10"
plural "false"
caps "false"
noprefix "false"
, 3 horizontal models was shown to be the worst performing configuration
with lower ensemble accuracy and higher disagreement.
This is likely due to larger proportion that a single model constitutes.
2021-03-28 18:37:32 +01:00
\begin_layout Section
2021-04-30 20:51:04 +01:00
Optimiser Comparisons
\begin_inset CommandInset label
LatexCommand label
name "sec:exp3"
2021-04-30 20:51:04 +01:00
Throughout the previous experiments the stochastic gradient descent optimiser
was used to change the networks weights but there are many different optimisati
on algorithms.
This section will present investigations into two other optimisation algorithms
and discuss the differences between them using the horizontal ensemble
classification of the previous section.
Prior to these investigations, however, stochastic gradient descent and
the two other subject algorithms will be described.
\begin_layout Subsubsection
Stochastic Gradient Descent
\begin_layout Subsubsection
\begin_layout Subsubsection
\begin_layout Section
\begin_inset Newpage newpage
\begin_inset CommandInset label
LatexCommand label
name "sec:bibliography"
\begin_inset CommandInset bibtex
LatexCommand bibtex
btprint "btPrintCited"
bibfiles "references"
options "bibtotoc"
Source Code
\begin_inset CommandInset include
LatexCommand lstinputlisting
filename "../nncw.py"
lstparams "caption={Formatted Jupyter notebook containing experiment code},label={notebook-code}"