Question? Leave a message!

Learning and Memory in Cognitive Systems

Learning and Memory in Cognitive Systems
Dr.BenjaminClark Profile Pic
Dr.BenjaminClark,United States,Teacher
Published Date:21-07-2017
Website URL
Intelligent Control and Cognitive Systems brings you... Learning and Memory in Cognitive Systems Joanna J. Bryson University of Bath, United KingdomSensing vs Perception First week: Sensing – what information • comes in. This week: Perception – what you think is • going on. Perception includes expectations. • Necessary for disambiguating noisy and • impoverished sensory information.“expectations” Bayes’  Theorem posterior  ∝ likelihood  ×  prior Given you’ve seen X, you can figure out if Y is likely true based on what you already know about the probability of experiencing: X independently, Y independently and X when you see Y.note to JB: copy X,Y to board, useful later One Application... Y – potential action • X – sensing • priors = memory • priors + sense = perception •Expectations For all cognitive systems, some priors are • hard-coded: body shape, sensing array, even neural connectivity. Derived from the experience of evolution • or from a designer. Other expectations are derived from an • individual’s own experience – learning.Learning Learning requires: • A representation. • A means of acting on current evidence. • A means of incorporating feedback • concerning the outcome of the guess. AI learning calls incorporating feedback • “error correction”.Yann LeCun (NYU) LearningisNOTMemorization rote learning is easy: just memorize all the training examples and their corresponding outputs. when a new input comes in, compare it to all the memorized samples, and producethe output associated with the matching sample. PROBLEM: in general, new inputs are differentfrom training samples. Theabilitytoproducecorrectoutputsorbehavioronpreviouslyunseeninputsis called GENERALIZATION. rote learning is memorization without generalization. The big question of Learning Theory (and practice): how to get good generalization with a limited numberof examples. Y. LeCun: Machine Learning andPattern Recognition– p. 10/29Learning Outcomes Objective is to do the right thing at the • right time (to be intelligent.) Doing the right thing often requires • predicting likely possible sensory conditions so you can disambiguate situations that would otherwise be perceptually aliased.TwoKindsofSupervisedLearning What we’ll use as an example today. Regression: also known as “curve fitting” or“function approximation”. Learn a continuous input-output mapping from a limited number of examples (possibly noisy). Classification: outputs are discrete vari- ables (category labels). Learn a decision boundary that separates one class from the other. Generally, a “confidence” is also de- sired (how sure are we that the input be- longs to the chosen category). Includes kernel methods (not covered here.) Y. LeCun: Machine Learning and Pattern Recognition – p. 8/29c.f. Lecture 5 “what the UnsupervisedLearning brain seems to be doing” Unsupervisedlearningcomesdowntothis: iftheinputlooks likethetrainingsamples, outputasmallnumber,ifitdoesn’t,outputalargenumber. Thisisahorrendouslyill-posedprobleminhigh dimension. Todoitright,wemustguess/discover thehiddenstructureoftheinputs. Methodsdiffer bytheirassumptionsaboutthenatureofthedata. ASpecialCase: DensityEstimation. Finda functionf suchf(X)approximatesthe probabilitydensityofX,p(X),aswellas possible. Clustering: discover“clumps”ofpoints Embedding: discoverlow-dimensionalmanifold orsurfacenearwhichthedatalives. Compression/Quantization: discover a function that for each input computes a compact “code” fromwhichtheinputcanbereconstructed. Y.LeCun: MachineLearningandPatternRecognition–p.9/29“Regression” via Chris Bishop Polynomial  Curve  Fi:ng   Representa)on:    Just  a  polynomial  equa?on.Example  Applica?on  to  Ac?on  Selec?on What  you  sensed Where  to  drive  your  motor.Sum-­‐of-­‐Squares  Error  Func?on Use  data  to  fix   the  world   model  currently   held  in  the   representa?on.Error functions Based on some parameter w (for weight – • more on why it’s called that later). Objective is to minimise error function. • Take its derivative with respect to w. • Go down (take second deriv. if nec.) • Linear functions gives a nice U function ∴ • you can tell when your done, derivative = 0.Theory vs Practice If we assume that noise in signal is • Normally distributed (with fixed variance), then least squares is equivalent to probabilistic methods (Per CM20220). Least squares is a lot easier to implement & • lighter-weight to run. To the extent the assumption doesn’t hold, • quality of results degrades – may be OK.Why Representations Matter Green line is model used to generate data (in combination with noise). Red line is the model learned from observing that 0  Order  Polynomialst 1  Order  Polynomialrd 3  Order  Polynomialth 9  Order  Polynomial