Question? Leave a message!

Deep learning introduction ppt

recent developments in deep learning ppt and what is deep learning ppt and deep learning hardware requirements
DannyConnolly Profile Pic
Published Date:14-07-2017
Website URL
Deep Learning on GPUs March 2016 DEEP LEARNING EVERYWHERE INTERNET & CLOUD MEDICINE & BIOLOGY SECURITY & DEFENSE AUTONOMOUS MACHINES MEDIA & ENTERTAINMENT Image Classification Cancer Cell Detection Face Detection Pedestrian Detection Video Captioning Speech Recognition Diabetic Grading Video Surveillance Lane Tracking Video Search Language Translation Drug Discovery Satellite Imagery Recognize Traffic Sign Real Time Translation Language Processing Sentiment Analysis Recommendation 4Traditional machine perception Hand crafted feature extractors Classifier/ Raw data Feature extraction Result detector SVM, shallow neural net, … Speaker ID, HMM, speech transcription, … shallow neural net, … Topic classification, machine translation, Clustering, HMM, sentiment analysis… LDA, LSA 5 …Deep learning approach Train: Errors Dog Dog Cat MODEL Raccoon Cat Honey badger Deploy: Dog MODEL 6Artificial neural network A collection of simple, trainable mathematical units that collectively learn complex functions Hidden layers Input layer Output layer Given sufficient training data an artificial neural network can approximate very complex functions mapping raw data to output decisions 7Artificial neurons Biological neuron Artificial neuron y w w w 1 2 3 x x x 1 2 3 From Stanford cs231n lecture notes y=F(w x +w x +w x ) 1 1 2 2 3 3 F(x)=max(0,x) 8Deep neural network (dnn) Raw data Low-level features Mid-level features High-level features Application components: Task objective e.g. Identify face Training data 10-100M images Network architecture 10 layers 1B parameters Learning algorithm Input Result 30 Exaflops 30 GPU days 9Deep learning benefits § Robust § No need to design the features ahead of time – features are automatically learned to be optimal for the task at hand § Robustness to natural variations in the data is automatically learned § Generalizable § The same neural net approach can be used for many different applications and data types § Scalable § Performance improves with more data, method is massively parallelizable 10Baidu Deep Speech 2 End-to-end Deep Learning for English and Mandarin Speech Recognition English and Mandarin speech recognition Transition from English to Mandarin made simpler by end-to-end DL No feature engineering or Mandarin-specifics required More accurate than humans Error rate 3.7% vs. 4% for human tests 11AlphaGo First Computer Program to Beat a Human Go Professional Training DNNs: 3 weeks, 340 million training steps on 50 GPUs Play: Asynchronous multi-threaded search Simulations on CPUs, policy and value DNNs in parallel on GPUs Single machine: 40 search threads, 48 CPUs, and 8 GPUs Distributed version: 40 search threads, 1202 CPUs and 176 GPUs Outcome: Beat both European and World Go champions in best of 5 matches 12Deep Learning for Autonomous vehicles 13Deep Learning Synthesis Texture synthesis and transfer using CNNs. Timo Aila et al., NVIDIA Research 14THE AI RACE IS ON IMAGENET Accuracy Rate 100% Traditional CV Deep Learning 90% 80% 70% Baidu Deep Speech 2 IBM Watson Achieves Breakthrough Facebook Beats Humans in Natural Language Processing Launches Big Sur 60% 50% 40% 30% 20% 10% Google Toyota Invests 1B Microsoft & U. Science & Tech, China Launches TensorFlow in AI Labs Beat Humans on IQ 0% 2009 2010 2011 2012 2013 2014 2015 2016 15The Big Bang in Machine Learning DNN BIG DATA GPU “ Google’s AI engine also reflects how the world of computer hardware is changing. (It) depends on machines equipped with GPUs… And it depends on these chips more than the larger tech universe realizes.” 16GPUs and DL USE MORE PROCESSORS TO GO FASTER 17Deep learning development cycle 18Three Kinds of Networks DNN – all fully connected layers CNN – some convolutional layers RNN – recurrent neural network, LSTM 19DNN Key operation is dense M x V Backpropagation uses dense matrix-matrix multiply starting from softmax scores 20DNN Batching for training and latency insensitive. M x M Batched operation is M x M – gives re-use of weights. Without batching, would use each element of Weight matrix once. Want 10-50 arithmetic operations per memory fetch for modern compute architectures. 21CNN Requires convolution and M x V Filters conserved through plane Multiply limited – even without batching. 22