Question? Leave a message!




Two Layer Artificial Neural Networks (ANNs)

Two Layer Artificial Neural Networks (ANNs)
Two Layer Artificial Neural Networks (ANNs) www.ThesisScientist.comNon Symbolic Representations  Decision trees can be easily read – A disjunction of conjunctions (logic) – We call this a symbolic representation  Non-symbolic representations – More numerical in nature, more difficult to read  Artificial Neural Networks (ANNs) – A Non-symbolic representation scheme – They embed a giant mathematical function  To take inputs and compute an output which is interpreted as a categorisation – Often shortened to “Neural Networks”  Don‟t confuse them with real neural networks (in heads) www.ThesisScientist.comFunction Learning  Map categorisation learning to numerical problem – Each category given a number – Or a range of real valued numbers (e.g., 0.5 - 0.9)  Function learning examples – Input = 1,2,3,4 Output = 1,4,9,16 – Here the concept to learn is squaring integers – Input = 1,2,3, 2,3,4, 3,4,5, 4,5,6 – Output = 1, 5, 11, 19 – Here the concept is: a,b,c - ac - b  The calculation is more complicated than in the first example  Neural networks: – Calculation is much more complicated in general – But it is still just a numerical calculation www.ThesisScientist.comComplicated Example: Categorising Vehicles  Input to function: pixel data from vehicle images – Output: numbers: 1 for a car; 2 for a bus; 3 for a tank INPUT INPUT INPUT INPUT OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1 www.ThesisScientist.comSo, what functions can we use?  Biological motivation: – The brain does categorisation tasks like this easily – The brain is made up of networks of neurons  Naturally occurring neural networks – Each neuron is connected to many others  Input to one neuron is the output from many others  Neuron “fires” if a weighted sum S of inputs threshold  Artificial neural networks – Similar hierarchy with neurons firing – Don‟t take the analogy too far  Human brains: 100,000,000,000 neurons  ANNs: 1000 usually  ANNs are a gross simplification of real neural networks www.ThesisScientist.comValue calculated using Choose Cat A all the input unit values (largest output value) General Idea INPUT LAYER HIDDEN LAYERS OUTPUT LAYER CATEGORY 1.1 Cat A 1.1 2.7 7.1 4.2 0.2 Cat B 3.0 2.1 -0.8 Cat C -1.3 0.3 -1.2 2.7 VALUES PROPAGATE THROUGH THE NETWORK www.ThesisScientist.com NUMBERS INPUT NUMBERS OUTPUTRepresentation of Information  If ANNs can correctly identify vehicles – They then contain some notion of “car”, “bus”, etc.  The categorisation is produced by the units (nodes) – Exactly how the input reals are turned into outputs  But, in practice: – Each unit does the same calculation  But it is based on the weighted sum of inputs to the unit – So, the weights in the weighted sum  Is where the information is really stored – We draw weights on to the ANN diagrams (see later)  “Black Box” representation: – Useful knowledge about learned concept is difficult to extract www.ThesisScientist.comANN learning problem  Given a categorisation to learn (expressed numerically) – And training examples represented numerically  With the correct categorisation for each example  Learn a neural network using the examples – which produces the correct output for unseen examples  Boils down to (a) Choosing the correct network architecture  Number of hidden layers, number of units, etc. (b) Choosing (the same) function for each unit (c) Training the weights between units to work correctly www.ThesisScientist.comSpecial Cases  Generally, can have many hidden layers – In practice, usually only one or two  Next lecture: – Look at ANNs with one hidden layer – Multi-layer ANNs  This lecture: – Look at ANNs with no hidden layer – Two layer ANNs – Perceptrons www.ThesisScientist.comPerceptrons  Multiple input nodes  Single output node – Takes a weighted sum of the inputs, call this S – Unit function calculates the output for the network  Useful to study because – We can use perceptrons to build larger networks  Perceptrons have limited representational abilities – We will look at concepts they can‟t learn later www.ThesisScientist.comSigma Step Function Function Unit Functions  Linear Functions – Simply output the weighted sum  Threshold Functions – Output low values  Until the weighted sum gets over a threshold  Then output high values  Equivalent of “firing” of neurons  Step function: – Output +1 if S Threshold T – Output –1 otherwise  Sigma function: – Similar to step function but differentiable (next lecture) www.ThesisScientist.comExample Perceptron  Categorisation of 2x2 pixel black & white images – Into “bright” and “dark”  Representation of this rule: – If it contains 2, 3 or 4 white pixels, it is “bright” – If it contains 0 or 1 white pixels, it is “dark”  Perceptron architecture: – Four input units, one for each pixel – One output unit: +1 for white, -1 for dark www.ThesisScientist.comExample Perceptron  Example calculation: x =-1, x =1, x =1, x =-1 1 2 3 4 – S = 0.25(-1) + 0.25(1) + 0.25(1) + 0.25(-1) = 0  0 -0.1, so the output from the ANN is +1 – So the image is categorised as “bright” www.ThesisScientist.comLearning in Perceptrons  Need to learn – Both the weights between input and output units – And the value for the threshold  Make calculations easier by – Thinking of the threshold as a weight from a special input unit where the output from the unit is always 1  Exactly the same result – But we only have to worry about learning weights www.ThesisScientist.comNew Representation for Perceptrons Special Input Unit Threshold function Always produces 1 has become this www.ThesisScientist.comLearning Algorithm  Weights are set randomly initially  For each training example E – Calculate the observed output from the ANN, o(E) – If the target output t(E) is different to o(E)  Then tweak all the weights so that o(E) gets closer to t(E)  Tweaking is done by perceptron training rule (next slide)  This routine is done for every example E  Don‟t necessarily stop when all examples used – Repeat the cycle again (an „epoch‟) – Until the ANN produces the correct output  For all the examples in the training set (or good enough) www.ThesisScientist.comPerceptron Training Rule  When t(E) is different to o(E) – Add on Δ to weight w i i – Where Δ = η(t(E)-o(E))x i i – Do this for every weight in the network  Interpretation: – (t(E) – o(E)) will either be +2 or –2 cannot be the same sign – So we can think of the addition of Δ as the movement of the i weight in a direction  Which will improve the networks performance with respect to E – Multiplication by xi  Moves it more if the input is bigger www.ThesisScientist.comThe Learning Rate  η is called the learning rate – Usually set to something small (e.g., 0.1)  To control the movement of the weights – Not to move too far for one example – Which may over-compensate for another example  If a large movement is actually necessary for the weights to correctly categorise E – This will occur over time with multiple epochs www.ThesisScientist.comWorked Example  Return to the “bright” and “dark” example  Use a learning rate of η = 0.1  Suppose we have set random weights: www.ThesisScientist.comWorked Example  Use this training example, E, to update weights:  Here, x1 = -1, x2 = 1, x3 = 1, x4 = -1 as before  Propagate this information through the network: – S = (-0.5 1) + (0.7 -1) + (-0.2 +1) + (0.1 +1) + (0.9 -1) = -2.2  Hence the network outputs o(E) = -1  But this should have been “bright”=+1 – So t(E) = +1 www.ThesisScientist.comCalculating the Error Values  Δ = η(t(E)-o(E))x 0 0 = 0.1 (1 - (-1)) (1) = 0.1 (2) = 0.2  Δ = η(t(E)-o(E))x 1 1 = 0.1 (1 - (-1)) (-1) = 0.1 (-2) = -0.2  Δ = η(t(E)-o(E))x 2 2 = 0.1 (1 - (-1)) (1) = 0.1 (2) = 0.2  Δ = η(t(E)-o(E))x 3 3 = 0.1 (1 - (-1)) (1) = 0.1 (2) = 0.2  Δ = η(t(E)-o(E))x 4 4 = 0.1 (1 - (-1)) (-1) = 0.1 (-2) = -0.2 www.ThesisScientist.comCalculating the New Weights  w‟ = -0.5 + Δ = -0.5 + 0.2 = -0.3 0 0  w‟ = 0.7 + Δ = 0.7 + -0.2 = 0.5 1 1  w‟ = -0.2 + Δ = -0.2 + 0.2 = 0 2 2  w‟ = 0.1 + Δ = 0.1 + 0.2 = 0.3 3 3  w‟ = 0.9 + Δ = 0.9 - 0.2 = 0.7 4 4 www.ThesisScientist.comNew Look Perceptron  Calculate for the example, E, again: – S = (-0.3 1) + (0.5 -1) + (0 +1) + (0.3 +1) + (0.7 -1) = -1.2  Still gets the wrong categorisation – But the value is closer to zero (from -2.2 to -1.2) – In a few epochs time, this example will be correctly categorised www.ThesisScientist.comLearning Abilities of Perceptrons  Perceptrons are a very simple network  Computational learning theory – Study of which concepts can and can‟t be learned  By particular learning techniques (representation, method)  Minsky and Papert‟s influencial book – Showed the limitations of perceptrons – Cannot learn some simple boolean functions – Caused a “winter” of research for ANNs in AI  People thought it represented a fundamental limitation  But perceptrons are the simplest network  ANNS were revived by neuroscientists, etc. www.ThesisScientist.comBoolean Functions  Take in two inputs (-1 or +1)  Produce one output (-1 or +1)  In other contexts, use 0 and 1  Example: AND function – Produces +1 only if both inputs are +1  Example: OR function – Produces +1 if either inputs are +1  Related to the logical connectives from F.O.L. www.ThesisScientist.comBoolean Functions as Perceptrons  Problem: XOR boolean function – Produces +1 only if inputs are different – Cannot be represented as a perceptron – Because it is not linearly separable www.ThesisScientist.comLinearly Separable Boolean Functions  Linearly separable: – Can use a line (dotted) to separate +1 and –1  Think of the line as representing the threshold – Angle of line determined by two weights in perceptron – Y-axis crossing determined by threshold www.ThesisScientist.comLinearly Separable Functions  Result extends to functions taking many inputs – And outputting +1 and –1  Also extends to higher dimensions for outputs www.ThesisScientist.com
Website URL
Comment