Question? Leave a message!




Multi-Layer Artificial Neural Networks (ANNs)

Multi-Layer Artificial Neural Networks (ANNs)
Dr.BenjaminClark Profile Pic
Dr.BenjaminClark,United States,Teacher
Published Date:21-07-2017
Website URL
Comment
Multi-Layer Artificial Neural Networks (ANNs) www.ThesisScientist.comMulti-Layer Networks Built from Perceptron Units  Perceptrons not able to learn certain concepts – Can only learn linearly separable functions  But they can be the basis for larger structures – Which can learn more sophisticated concepts – Say that the networks have “perceptron units” www.ThesisScientist.comProblem With Perceptron Units  The learning rule relies on differential calculus – Finding minima by differentiating, etc.  Step functions aren’t differentiable – They are not continuous at the threshold  Alternative threshold function sought – Must be differentiable – Must be similar to step function  i.e., exhibit a threshold so that units can “fire” or not fire  Sigmoid units used for backpropagation – There are other alternatives that are often used www.ThesisScientist.comSigmoid Units  Take in weighted sum of inputs, S and output:  Advantages: – Looks very similar to the step function – Is differentiable – Derivative easily expressible in terms of σ itself: www.ThesisScientist.comExample ANN with Sigmoid Units  Feed forward network – Feed inputs in on the left, propagate numbers forward  Suppose we have this ANN – With weights set arbitrarily www.ThesisScientist.comPropagation of Example  Suppose input to ANN is 10, 30, 20  First calculate weighted sums to hidden layer: – S = (0.210) + (-0.130) + (0.420) = 2-3+8 = 7 H1 – S = (0.710) + (-1.230) + (1.220) = 7-6+24= -5 H2  Next calculate the output from the hidden layer: -S – Using: σ(S) = 1/(1 + e ) -7 – σ(S ) = 1/(1 + e ) = 1/(1+0.000912) = 0.999 H1 5 – σ(S ) = 1/(1 + e ) = 1/(1+148.4) = 0.0067 H2 – So, H1 has fired, H2 has not www.ThesisScientist.comPropagation of Example  Next calculate the weighted sums into the output layer: – S = (1.1 0.999) + (0.1 0.0067) = 1.0996 O1 – S = (3.1 0.999) + (1.17 0.0067) = 3.1047 O2  Finally, calculate the output from the ANN -1.0996 – σ(S ) = 1/(1+e ) = 1/(1+0.333) = 0.750 O1 -3.1047 – σ(S ) = 1/(1+e ) = 1/(1+0.045) = 0.957 O2  Output from O2 output from O1 – So, the ANN predicts category associated with O2 – For the example input (10,30,20) www.ThesisScientist.comBackpropagation Learning Algorithm  Same task as in perceptrons – Learn a multi-layer ANN to correctly categorise unseen examples – We’ll concentrate on ANNs with one hidden layer  Overview of the routine – Fix architecture and sigmoid units within architecture  i.e., number of units in hidden layer; the way the input units represent example; the way the output units categorises examples – Randomly assign weights to the the whole network  Use small values (between –0.5 and 0.5) – Use each example in the set to retrain the weights – Have multiple epochs (iterations through training set)  Until some termination condition is met (not necessarily 100% acc) www.ThesisScientist.comWeight Training Calculations (Overview)  Use notation w to specify: ij – Weight between unit i and unit j  Look at the calculation with respect to example E  Going to calculate a value Δ for each w ij ij – And add Δ on to w ij ij  Do this by calculating error terms for each unit  The error term for output units is found – And then this information is used to calculate the error terms for the hidden units  So, the error is propagated back through the ANN www.ThesisScientist.comPropagate E through the Network  Feed E through the network (as in example above)  Record the target and observed values for example E – i.e., determine weighted sum from hidden units, do sigmoid calc – Let t (E) be the target values for output unit i i – Let o (E) be the observed value for output unit i i  Note that for categorisation learning tasks, – Each t (E) will be 0, except for a single t (E), which will be 1 i j – But o (E) will be a real valued number between 0 and 1 i  Also record the outputs from the hidden units – Let h (E) be the output from hidden unit i i www.ThesisScientist.comError terms for each unit  The Error Term for output unit k is calculated as:  The Error Term for hidden unit k is:  In English: – For hidden unit h, add together all the errors for the output units, multiplied by the appropriate weight. – Then multiply this sum by h (E)(1 – h (E)) k k www.ThesisScientist.comFinal Calculations  Choose a learning rate, η (= 0.1 again, perhaps)  For each weight w ij – Between input unit i and hidden unit j – Calculate: – Where x is the input to the system to input unit i for E i  For each weight w ij – Between hidden unit i and output unit j – Calculate: – Where h (E) is the output from hidden unit i for E i  Finally, add on each Δ on to w ij ij www.ThesisScientist.comWorked Backpropagation Example  Start with the previous ANN  We will retrain the weights – In the light of example E = (10,30,20) – Stipulate that E should have been categorised as O1 – Will use a learning rate of η = 0.1 www.ThesisScientist.comPrevious Calculations  Need the calculations from when we propagated E through the ANN:  t (E) = 1 and t (E) = 0 from categorisation 1 2  o (E) = 0.750 and o (E) = 0.957 1 2 www.ThesisScientist.comError Values for Output Units  t (E) = 1 and t (E) = 0 from categorisation 1 2  o (E) = 0.750 and o (E) = 0.957 1 2  So: www.ThesisScientist.comError Values for Hidden Units  δ = 0.0469 and δ = -0.0394 O1 O2  h (E) = 0.999 and h (E) = 0.0067 1 2  So, for H1, we add together: – (w δ ) + (w δ ) = (1.10.0469)+(3.1-0.0394) = -0.0706 11 01 12 O2 – And multiply by: h (E)(1-h (E)) to give us: 1 1  -0.0706 (0.999 (1-0.999)) = 0.0000705 = δ H1  For H2, we add together: – (w δ ) + (w δ ) = (0.10.0469)+(1.17-0.0394) = -0.0414 21 01 22 O2 – And multiply by: h (E)(1-h (E)) to give us: 2 2  -0.0414 (0.067 (1-0.067)) = -0.00259= δ www.ThesisScientist.com H2Calculation of Weight Changes  For weights between the input and hidden layer www.ThesisScientist.comCalculation of Weight Changes  For weights between hidden and output layer  Weight changes are not very large – Small differences in weights can make big differences in calculations – But it might be a good idea to increase η www.ThesisScientist.comCalculation of Network Error  Could calculate Network error as – Proportion of mis-categorised examples  But there are multiple output units, with numerical output – So we use a more sophisticated measure:  Not as complicated as it looks – Square the difference between target and observed  Squaring ensures we get a positive number  Add up all the squared differences – For every output unit and every example in training set www.ThesisScientist.comProblems with Local Minima  Backpropagation is gradient descent search – Where the height of the hills is determined by error – But there are many dimensions to the space  One for each weight in the network  Therefore backpropagation – Can find its ways into local minima  One partial solution: – Random re-start: learn lots of networks  Starting with different random weight settings – Can take best network – Or can set up a “committee” of networks to categorise examples  Another partial solution: Momentum www.ThesisScientist.com