Question? Leave a message!

Two Layer Artificial Neural Networks (ANNs)

Two Layer Artificial Neural Networks (ANNs)
Dr.BenjaminClark Profile Pic
Dr.BenjaminClark,United States,Teacher
Published Date:21-07-2017
Website URL
Two Layer Artificial Neural Networks (ANNs) www.ThesisScientist.comNon Symbolic Representations  Decision trees can be easily read – A disjunction of conjunctions (logic) – We call this a symbolic representation  Non-symbolic representations – More numerical in nature, more difficult to read  Artificial Neural Networks (ANNs) – A Non-symbolic representation scheme – They embed a giant mathematical function  To take inputs and compute an output which is interpreted as a categorisation – Often shortened to “Neural Networks”  Don‟t confuse them with real neural networks (in heads) www.ThesisScientist.comFunction Learning  Map categorisation learning to numerical problem – Each category given a number – Or a range of real valued numbers (e.g., 0.5 - 0.9)  Function learning examples – Input = 1,2,3,4 Output = 1,4,9,16 – Here the concept to learn is squaring integers – Input = 1,2,3, 2,3,4, 3,4,5, 4,5,6 – Output = 1, 5, 11, 19 – Here the concept is: a,b,c - ac - b  The calculation is more complicated than in the first example  Neural networks: – Calculation is much more complicated in general – But it is still just a numerical calculation www.ThesisScientist.comComplicated Example: Categorising Vehicles  Input to function: pixel data from vehicle images – Output: numbers: 1 for a car; 2 for a bus; 3 for a tank INPUT INPUT INPUT INPUT OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1 www.ThesisScientist.comSo, what functions can we use?  Biological motivation: – The brain does categorisation tasks like this easily – The brain is made up of networks of neurons  Naturally occurring neural networks – Each neuron is connected to many others  Input to one neuron is the output from many others  Neuron “fires” if a weighted sum S of inputs threshold  Artificial neural networks – Similar hierarchy with neurons firing – Don‟t take the analogy too far  Human brains: 100,000,000,000 neurons  ANNs: 1000 usually  ANNs are a gross simplification of real neural networks www.ThesisScientist.comValue calculated using Choose Cat A all the input unit values (largest output value) General Idea INPUT LAYER HIDDEN LAYERS OUTPUT LAYER CATEGORY 1.1 Cat A 1.1 2.7 7.1 4.2 0.2 Cat B 3.0 2.1 -0.8 Cat C -1.3 0.3 -1.2 2.7 VALUES PROPAGATE THROUGH THE NETWORK NUMBERS INPUT NUMBERS OUTPUTRepresentation of Information  If ANNs can correctly identify vehicles – They then contain some notion of “car”, “bus”, etc.  The categorisation is produced by the units (nodes) – Exactly how the input reals are turned into outputs  But, in practice: – Each unit does the same calculation  But it is based on the weighted sum of inputs to the unit – So, the weights in the weighted sum  Is where the information is really stored – We draw weights on to the ANN diagrams (see later)  “Black Box” representation: – Useful knowledge about learned concept is difficult to extract www.ThesisScientist.comANN learning problem  Given a categorisation to learn (expressed numerically) – And training examples represented numerically  With the correct categorisation for each example  Learn a neural network using the examples – which produces the correct output for unseen examples  Boils down to (a) Choosing the correct network architecture  Number of hidden layers, number of units, etc. (b) Choosing (the same) function for each unit (c) Training the weights between units to work correctly www.ThesisScientist.comSpecial Cases  Generally, can have many hidden layers – In practice, usually only one or two  Next lecture: – Look at ANNs with one hidden layer – Multi-layer ANNs  This lecture: – Look at ANNs with no hidden layer – Two layer ANNs – Perceptrons www.ThesisScientist.comPerceptrons  Multiple input nodes  Single output node – Takes a weighted sum of the inputs, call this S – Unit function calculates the output for the network  Useful to study because – We can use perceptrons to build larger networks  Perceptrons have limited representational abilities – We will look at concepts they can‟t learn later www.ThesisScientist.comSigma Step Function Function Unit Functions  Linear Functions – Simply output the weighted sum  Threshold Functions – Output low values  Until the weighted sum gets over a threshold  Then output high values  Equivalent of “firing” of neurons  Step function: – Output +1 if S Threshold T – Output –1 otherwise  Sigma function: – Similar to step function but differentiable (next lecture) www.ThesisScientist.comExample Perceptron  Categorisation of 2x2 pixel black & white images – Into “bright” and “dark”  Representation of this rule: – If it contains 2, 3 or 4 white pixels, it is “bright” – If it contains 0 or 1 white pixels, it is “dark”  Perceptron architecture: – Four input units, one for each pixel – One output unit: +1 for white, -1 for dark www.ThesisScientist.comExample Perceptron  Example calculation: x =-1, x =1, x =1, x =-1 1 2 3 4 – S = 0.25(-1) + 0.25(1) + 0.25(1) + 0.25(-1) = 0  0 -0.1, so the output from the ANN is +1 – So the image is categorised as “bright” www.ThesisScientist.comLearning in Perceptrons  Need to learn – Both the weights between input and output units – And the value for the threshold  Make calculations easier by – Thinking of the threshold as a weight from a special input unit where the output from the unit is always 1  Exactly the same result – But we only have to worry about learning weights www.ThesisScientist.comNew Representation for Perceptrons Special Input Unit Threshold function Always produces 1 has become this www.ThesisScientist.comLearning Algorithm  Weights are set randomly initially  For each training example E – Calculate the observed output from the ANN, o(E) – If the target output t(E) is different to o(E)  Then tweak all the weights so that o(E) gets closer to t(E)  Tweaking is done by perceptron training rule (next slide)  This routine is done for every example E  Don‟t necessarily stop when all examples used – Repeat the cycle again (an „epoch‟) – Until the ANN produces the correct output  For all the examples in the training set (or good enough) www.ThesisScientist.comPerceptron Training Rule  When t(E) is different to o(E) – Add on Δ to weight w i i – Where Δ = η(t(E)-o(E))x i i – Do this for every weight in the network  Interpretation: – (t(E) – o(E)) will either be +2 or –2 cannot be the same sign – So we can think of the addition of Δ as the movement of the i weight in a direction  Which will improve the networks performance with respect to E – Multiplication by xi  Moves it more if the input is bigger www.ThesisScientist.comThe Learning Rate  η is called the learning rate – Usually set to something small (e.g., 0.1)  To control the movement of the weights – Not to move too far for one example – Which may over-compensate for another example  If a large movement is actually necessary for the weights to correctly categorise E – This will occur over time with multiple epochs www.ThesisScientist.comWorked Example  Return to the “bright” and “dark” example  Use a learning rate of η = 0.1  Suppose we have set random weights: www.ThesisScientist.comWorked Example  Use this training example, E, to update weights:  Here, x1 = -1, x2 = 1, x3 = 1, x4 = -1 as before  Propagate this information through the network: – S = (-0.5 1) + (0.7 -1) + (-0.2 +1) + (0.1 +1) + (0.9 -1) = -2.2  Hence the network outputs o(E) = -1  But this should have been “bright”=+1 – So t(E) = +1