Question? Leave a message!

Introduction to Machine Learning

Introduction to Machine Learning 17
JadenNorton Profile Pic
JadenNorton,United States,Researcher
Published Date:14-07-2017
Website URL
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2 Contents  Definition and Motivation  History of Deep architectures  Deep architectures  Convolutional networks  Deep Belief networks  Applications 3 Deep architectures Defintion: Deep architectures are composed of multiple levels of non-linear operations, such as neural nets with many hidden layers. Output layer Hidden layers Input layer 4 Goal of Deep architectures Goal: Deep learning methods aim at  learning feature hierarchies  where features from higher levels of the hierarchy are formed by lower level features. edges, local shapes, object parts Low level representation 5 Figure is from Yoshua Bengio Neurobiological Motivation  Most current learning algorithms are shallow architectures (1-3 levels) (SVM, kNN, MoG, KDE, Parzen Kernel regression, PCA, Perceptron,…)  The mammal brain is organized in a deep architecture (Serre, Kreiman, Kouh, Cadieu, Knoblich, & Poggio, 2007) (E.g. visual system has 5 to 10 levels) 6 Deep Learning History  Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.  No successful attempts were reported before 2006 … Researchers reported positive experimental results with typically two or three levels (i.e. one or two hidden layers), but training deeper networks consistently yielded poorer results.  Exception: convolutional neural networks, LeCun 1998  SVM: Vapnik and his co-workers developed the Support Vector Machine (1993). It is a shallow architecture.  Digression: In the 1990’s, many researchers abandoned neural networks with multiple adaptive hidden layers because SVMs worked better, and there was no successful attempts to train deep networks.  Breakthrough in 2006 7 Breakthrough Deep Belief Networks (DBN) Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554. Autoencoders Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. (2007). Greedy Layer-Wise Training of Deep Networks, Advances in Neural Information Processing Systems 19 8 Theoretical Advantages of Deep Architectures  Some functions cannot be efficiently represented (in terms of number of tunable elements) by architectures that are too shallow.  Deep architectures might be able to represent some functions otherwise not efficiently representable.  More formally: Functions that can be compactly represented by a depth k architecture might require an exponential number of computational elements to be represented by a depth k − 1 architecture  The consequences are  Computational: We don’t need exponentially many elements in the layers  Statistical: poor generalization may be expected when using an insufficiently deep architecture for representing some functions. 9 Theoretical Advantages of Deep Architectures The Polynoimal circuit: 10 Deep Convolutional Networks 11 Deep Convolutional Networks  Deep supervised neural networks are generally too difficult to train.  One notable exception: convolutional neural networks (CNN)  Convolutional nets were inspired by the visual system’s structure  They typically have five, six or seven layers, a number of layers which makes fully-connected neural networks almost impossible to train properly when initialized randomly. 12 Deep Convolutional Networks Compared to standard feedforward neural networks with similarly-sized layers,  CNNs have much fewer connections and parameters  and so they are easier to train,  while their theoretically-best performance is likely to be only slightly worse. LeNet 5 Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998 13 LeNet 5, LeCun 1998  Input: 32x32 pixel image. Largest character is 20x20 (All important info should be in the center of the receptive field of the highest level feature detectors)  Cx: Convolutional layer  Sx: Subsample layer  Fx: Fully connected layer  Black and White pixel values are normalized: E.g. White = -0.1, Black =1.175 (Mean of pixels = 0, Std of pixels =1) 14 LeNet 5, Layer C1 C1: Convolutional layer with 6 feature maps of size 28x28. C1 (k=1…6) k Each unit of C1 has a 5x5 receptive field in the input layer.  Topological structure  Sparse connections  Shared weights (55+1)6=156 parameters to learn Connections: 2828(55+1)6=122304 15 If it was fully connected we had (3232+1)(2828)6 parameters LeNet 5, Layer S2 S2: Subsampling layer with 6 feature maps of size 14x14 2x2 nonoverlapping receptive fields in C1 Layer S2: 62=12 trainable parameters. Connections: 1414(22+1)6=5880 16 LeNet 5, Layer C3  C3: Convolutional layer with 16 feature maps of size 10x10  Each unit in C3 is connected to several 5x5 receptive fields at identical locations in S2 Layer C3: 1516 trainable parameters. Connections: 151600 17 LeNet 5, Layer S4  S4: Subsampling layer with 16 feature maps of size 5x5  Each unit in S4 is connected to the corresponding 2x2 receptive field at C3 Layer S4: 162=32 trainable parameters. Connections: 55(22+1)16=2000 18 LeNet 5, Layer C5  C5: Convolutional layer with 120 feature maps of size 1x1  Each unit in C5 is connected to all 16 5x5 receptive fields in S4 Layer C5: 120(1625+1) = 48120 trainable parameters and connections (Fully connected) 19 LeNet 5, Layer C5 Layer F6: 84 fully connected units. 84(120+1)=10164 trainable parameters and connections. Output layer: 10RBF (One for each digit) 84=7x12, stylized image Weight update: Backpropagation 20