cognitive learning and technology tools and cognitive learning environment
and Cognitive Systems
Learning and Memory
in Cognitive Systems
Joanna J. Bryson
University of Bath, United KingdomSensing vs Perception
First week: Sensing – what information
This week: Perception – what you think is
Perception includes expectations.
Necessary for disambiguating noisy and
impoverished sensory information.“expectations”
Given you’ve seen X, you can ﬁgure out if Y is likely
true based on what you already know about the
probability of experiencing: X independently, Y
independently and X when you see Y.note to JB: copy X,Y to
Y – potential action
X – sensing
priors = memory
priors + sense = perception
For all cognitive systems, some priors are
hard-coded: body shape, sensing array, even
Derived from the experience of evolution
or from a designer.
Other expectations are derived from an
individual’s own experience – learning.Learning
A means of acting on current evidence.
A means of incorporating feedback
concerning the outcome of the guess.
AI learning calls incorporating feedback
“error correction”.Yann LeCun (NYU)
rote learning is easy: just memorize all the training examples and their
when a new input comes in, compare it to all the memorized samples, and
producethe output associated with the matching sample.
PROBLEM: in general, new inputs are differentfromtraining samples.
rote learning is memorization without generalization.
The big question of Learning Theory (and practice): how to get good
generalization with a limited numberof examples.
Y. LeCun: Machine Learning andPattern Recognition– p. 10/29Learning Outcomes
Objective is to do the right thing at the
right time (to be intelligent.)
Doing the right thing often requires
predicting likely possible sensory conditions
so you can disambiguate situations that
would otherwise be perceptually aliased.TwoKindsofSupervisedLearning
What we’ll use as an
Regression: also known as “curve ﬁtting”
or“function approximation”. Learn a
continuous input-output mapping from a
limited number of examples (possibly
Classiﬁcation: outputs are discrete vari-
ables (category labels). Learn a decision
boundary that separates one class from the
other. Generally, a “conﬁdence” is also de-
sired (how sure are we that the input be-
longs to the chosen category).
Includes kernel methods (not
Y. LeCun: Machine Learning and Pattern Recognition – p. 8/29c.f. Lecture 5 “what the
brain seems to be doing”
Unsupervisedlearningcomesdowntothis: iftheinputlooks likethetrainingsamples,
ASpecialCase: DensityEstimation. Finda
Compression/Quantization: discover a function
that for each input computes a compact “code”
via Chris Bishop
Based on some parameter w (for weight –
more on why it’s called that later).
Objective is to minimise error function.
Take its derivative with respect to w.
Go down (take second deriv. if nec.)
Linear functions gives a nice U function ∴
you can tell when your done, derivative = 0.Theory vs Practice
If we assume that noise in signal is
Normally distributed (with ﬁxed variance),
then least squares is equivalent to
probabilistic methods (Per CM20220).
Least squares is a lot easier to implement &
lighter-weight to run.
To the extent the assumption doesn’t hold,
quality of results degrades – may be OK.Why Representations
Green line is model used to generate data (in
combination with noise).
Red line is the model learned from observing that data.th