Done, your profile is created.Finish your profile by filling in the following fields
Forgot Password Earn Money,Free Notes
Password sent to your Email Id, Please Check your Mail
Updating Cart........ Please Wait........
Introduction to Machine Learning
Introduction to Machine Learning 17
Introduction to Machine Learning
Barnabás Póczos & Aarti Singh Credits
Many of the pictures, results, and other materials are taken from:
Definition and Motivation
History of Deep architectures
Deep Belief networks
3 Deep architectures
Defintion: Deep architectures are composed of multiple levels of non-linear
operations, such as neural nets with many hidden layers.
4 Goal of Deep architectures
Goal: Deep learning methods aim at
learning feature hierarchies
where features from higher levels of the
hierarchy are formed by lower level features.
edges, local shapes, object parts
Low level representation
Figure is from Yoshua Bengio Neurobiological Motivation
Most current learning algorithms are shallow architectures (1-3 levels)
(SVM, kNN, MoG, KDE, Parzen Kernel regression, PCA, Perceptron,…)
The mammal brain is organized in a deep architecture (Serre, Kreiman,
Kouh, Cadieu, Knoblich, & Poggio, 2007)
(E.g. visual system has 5 to 10 levels)
6 Deep Learning History
Inspired by the architectural depth of the brain, researchers wanted
for decades to train deep multi-layer neural networks.
No successful attempts were reported before 2006 …
Researchers reported positive experimental results with typically
two or three levels (i.e. one or two hidden layers), but training
deeper networks consistently yielded poorer results.
Exception: convolutional neural networks, LeCun 1998
SVM: Vapnik and his co-workers developed the Support Vector
Machine (1993). It is a shallow architecture.
Digression: In the 1990’s, many researchers abandoned neural
networks with multiple adaptive hidden layers because SVMs worked
better, and there was no successful attempts to train deep networks.
Breakthrough in 2006
Deep Belief Networks (DBN)
Hinton, G. E, Osindero, S., and Teh, Y. W. (2006).
A fast learning algorithm for deep belief nets.
Neural Computation, 18:1527-1554.
Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007).
Greedy Layer-Wise Training of Deep Networks,
Advances in Neural Information Processing Systems 19
8 Theoretical Advantages of Deep
Some functions cannot be efﬁciently represented (in terms of number
of tunable elements) by architectures that are too shallow.
Deep architectures might be able to represent some functions
otherwise not efﬁciently representable.
Functions that can be compactly represented by a depth k
architecture might require an exponential number of
computational elements to be represented by a depth k − 1
The consequences are
Computational: We don’t need exponentially many elements in
Statistical: poor generalization may be expected when using an
insufﬁciently deep architecture for representing some functions.
9 Theoretical Advantages of Deep
The Polynoimal circuit:
10 Deep Convolutional Networks
11 Deep Convolutional Networks
Deep supervised neural networks are generally too difﬁcult to train.
One notable exception: convolutional neural networks (CNN)
Convolutional nets were inspired by the visual system’s structure
They typically have ﬁve, six or seven layers, a number of layers which
makes fully-connected neural networks almost impossible to train
properly when initialized randomly.
12 Deep Convolutional Networks
Compared to standard feedforward neural networks with similarly-sized layers,
CNNs have much fewer connections and parameters
and so they are easier to train,
while their theoretically-best performance is likely to be only slightly
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning
Applied to Document Recognition, Proceedings of the IEEE,
86(11):2278-2324, November 1998
13 LeNet 5, LeCun 1998
Input: 32x32 pixel image. Largest character is 20x20
(All important info should be in the center of the receptive field of the
highest level feature detectors)
Cx: Convolutional layer
Sx: Subsample layer
Fx: Fully connected layer
Black and White pixel values are normalized:
E.g. White = -0.1, Black =1.175 (Mean of pixels = 0, Std of pixels =1)
14 LeNet 5, Layer C1
C1: Convolutional layer with 6 feature maps of size 28x28. C1 (k=1…6)
Each unit of C1 has a 5x5 receptive field in the input layer.
(55+1)6=156 parameters to learn
If it was fully connected we had (3232+1)(2828)6 parameters LeNet 5, Layer S2
S2: Subsampling layer with 6 feature maps of size 14x14
2x2 nonoverlapping receptive fields in C1
Layer S2: 62=12 trainable parameters.
16 LeNet 5, Layer C3
C3: Convolutional layer with 16 feature maps of size 10x10
Each unit in C3 is connected to several 5x5 receptive fields at identical
locations in S2
1516 trainable parameters.
17 LeNet 5, Layer S4
S4: Subsampling layer with 16 feature maps of size 5x5
Each unit in S4 is connected to the corresponding 2x2 receptive field at
Layer S4: 162=32 trainable parameters.
18 LeNet 5, Layer C5
C5: Convolutional layer with 120 feature maps of size 1x1
Each unit in C5 is connected to all 16 5x5 receptive fields in S4
Layer C5: 120(1625+1) = 48120 trainable parameters and connections
19 LeNet 5, Layer C5
Layer F6: 84 fully connected units. 84(120+1)=10164 trainable
parameters and connections.
Output layer: 10RBF (One for each digit)
84=7x12, stylized image
Weight update: Backpropagation