What is Deep Learning (Best Tutorial 2019)

What is Deep Learning

What is Deep Learning

Over the past several years deep learning has worked its way into the language of business when conversations about artificial intelligence, analytics, and big data come up. And there is a good reason for that.


Its approach to AI is showing a lot of promise when it comes to coming up with autonomous self-teaching systems. These things are revolutionizing a lot of industries.


Google uses deep learning for voice recognition algorithms. Amazon and Netflix use it to decide what it is that you are interested in watching or buying next.


MIT researchers use it to predict the future. This very well established and still growing industry is always looking for a chance to sell these tools about how revolutionary it all is. But what is it exactly? Is this just some other fad that is used to push the old-fashioned AI on us by using some sexy new name?


It would probably be helpful to look at deep learning as the cutting-edge of the cutting-edge. Machine learning uses some of the main ideas of AI and focuses on figuring out some real-world problems using neural networks that are used to mimic a human brain’s decision-making process.


Deep learning likes to focus more on narrower subsets of machine learning tools and techniques, and then it will apply them to figure out almost any problem that needs thinking, whether artificial or human.


If you are just starting in the deep learning field, or if you have a bit of experience with neural networks a long time ago, you will probably find yourself a bit confused. A lot of people have been baffled by this, especially those who learned about neural networks in the 1990s and early 2000s.


How it Works

Basically, deep learning involves providing a computer system with a whole lot of data, which it will then use to make decisions about different types of data. The data will then be fed throughout a neural network, which is the same as machine learning.


The networks are logical constructions which ask a bunch of binary true and false questions. They also extract a numerical value of the entire data collection which runs throughout them, and they classify them according to the answers that they get.


Since deep learning is mainly focused on coming up with these networks, they will then become what is known as a deep neural network. This is a logic network of complexity that is needed to handle the classifying datasets as large as Twitter’s firehose of tweets or Google’s image library.


When you have data sets that are as comprehensive as these and logical networks that are created sophisticated enough to handle classification, it will end up being trivial for the computer to be able to take an image and tell you with a high probability of accuracy what a human would see it as.


Pictures is a perfect example of how all of this works because they typically have a lot of different elements, and they are easy for us to grasp the way a computer, which has a one-track calculation mind, can figure out how to interpret them just like we would.


However, the great thing about deep learning is that it can be applied to all types of data, written words, speech, video, audio, and machine signals so that it can produce conclusions that appear to have come from a human, extremely fast humans. We’re going to take a look at a practical example.


Take, for example, a system that was created to automatically record and report the number of vehicles of a certain make and model that travel across a public road.


First, they would access a large database of the different car types which include their engine sound, shape, and size. This could be done in a manual fashion or, when it comes to advance use cases, it could be automatically compiled using a system if it has been programmed to scour the internet and take in all of the data that it discovers.


Then, it would have to take all of the data that it needs to process. This would be real-world data that contains the insights, which for this example would need to capture roadside cameras and microphones.


By comparing the data from all of the sensors with the data that it has learned, it is then able to classify, with a probable accuracy, the passing vehicle, and their make and model.


At this point, it is pretty much straightforward. The word deep comes in because the system, as time passes and it is able to gain more experience, is able to increase the probability that it will classify information correctly by training itself on all of the new data that it receives. Basically, it is able to learn from its own mistakes, just like humans.


For example, it could end up incorrectly deciding that a certain vehicle is a certain type of make and model which was based upon their similar engine noise and size.


It would overlook other differentiators that it thought would have a low probability of being important in making this particular decision. Since it has now learned that this differentiator is actually important in identifying two different vehicles, it will be able to improve the odds of it correctly picking the vehicle next time.


Deep learning hardware guide

There are a few other important things to note while setting up your own hardware for deep learning application development. In this section, we will outline some of the most important aspects of GPU computing.


CPU cores

Most deep learning applications and libraries use a single core CPU unless they are used within a parallelization framework like Message-Passing Interface (MPI), MapReduce, or Spark.


For example, CaffeOnSpark by the team at Yahoo! uses Spark with Caffe for parallelizing network training across multiple GPUs and CPUs. In most normal settings in a single box, one CPU core is enough for deep learning application development.


CPU cache size

CPU cache size is an important CPU component that is used for high-speed computations. A CPU cache is often organized as a hierarchy of cache layers, from L1 to L4 L1, and L2 being smaller and faster cache layers as opposed to the larger and slower layers L3 and L4.


In an ideal setting, every data needed by the application resides in caches and hence no read is required from RAM, thereby making the overall operation faster.


However, this is hardly the scenario for most of the deep learning applications. For example, for a typical ImageNet experiment with a batch size of 128, we need more than 85MB of CPU cache to store all information for one mini batch.


Since such datasets are not small enough to be cache-only, a RAM read cannot be avoided. Hence modern day CPU cache sizes have little to no impact on the performance of deep learning applications.


RAM size

As we saw previously in this section, most of the deep learning applications read directly from RAM instead of CPU caches. Hence, it is often advisable to keep the CPU RAM almost as large, if not larger, than GPU RAM.


The size of the GPU RAM depends on the size of your deep learning model. For example, ImageNet based deep learnings models have a large number of parameters taking 4 GB to 5 GB of space, hence a GPU with at least 6 GB of RAM would be an ideal fit for such applications.


Paired with a CPU with at least 8 GB or preferably more CPU RAM will allow application developers to focus on key aspects of their application instead of debugging RAM performance issues.


Hard drive

Typical deep learning applications required large sets of data that is in 100s of GB. Since this data cannot be set in any RAM, there is an ongoing data pipeline is constructed. A deep learning application loads the mini-batch data from GPU RAM, which in turns keeps on reading data from CPU RAM, which loads data directly from the hard drive.


Since GPU's have a larger number of cores and each of these cores have a mini-batch of their data, they constantly need to be reading large volumes of data from the disk to allow for high data parallelism.


For example, in Alexie's Convolutional Neural Network (CNN) based model, roughly 300 MB of data needs to be read every second. This can often cripple the overall application performance. Hence, a solid state driver (SSD) is often the right choice for most deep learning application developers.


Cooling systems

Modern-day GPU's are energy efficient and have in-built mechanisms to prevent them from overheating. For instance, when a GPU increases their speed and power consumption, their temperature rises as well.


Typically at around 80eC, their inbuilt temperature control kicks in, which reduces their speed thereby automatically cooling the GPUs. The real bottleneck in this process is the poor design of pre-programmed schedules for fan speeds.


In a typical deep learning application, an 80eC temperature is reached within the first few seconds of the application, thereby lowering the GPU performance from the start and providing a poor GPU throughput. To complicate matters, most of the existing fan scheduling options are not available in Linux where most of the current day deep learning applications work.


A number of options exist today to alleviate this problem. First, a Basic Input/Output System (BIOS) upgrade with a modified fan schedule can provide the optimal balance between overheating and performance. Another option to use for an external cooling system, such as a water cooling system.


However, this option is mostly applicable to GPU farms where multiple GPU servers are running. External cooling systems are also a bit expensive so cost also becomes an important factor in selecting the right cooling system for your application.


Deep learning software frameworks

  • Every good deep learning application needs to have several components to be able to function correctly. These include:
  • A model layer which allows a developer to design his or her own model with more flexibility
  • A GPU layer that makes it seamless for application developers to choose between GPU/CPU for its application
  • A parallelization layer that can allow the developer to scale his or her application to run on multiple devices or instances


As you can imagine, implementing these modules is not easy. Often a developer needs to spend more time on debugging implementation issues rather than the legitimate model issues.


Thankfully, a number of software frameworks exist in the industry today which make deep learning application development practically the first class of its programming language.


These frameworks vary in architecture, design, and feature but almost all of them provide immense value to developers by providing them easy and fast implementation framework for their applications. In this section, we will take a look at some popular deep learning software frameworks and how they compare with each other.


TensorFlow a deep learning library

TensorFlow is an open source software library for numerical computation using data flow graphs. Designed and developed by Google, TensorFlow represents the complete data computation as a flow graph.


Each node in this graph can be represented as a mathematical operator. An edge connecting two nodes represents the multi-dimensional data that flows between the two nodes.


One of the primary advantages of TensorFlow is that it supports CPU and GPU as well as mobile devices, thereby making it almost seamless for developers to write code against any device architecture. TensorFlow also has a very big community of developers leading to a huge momentum behind this framework.



Caffe was designed and developed at Berkeley Artificial Intelligence Research (BAIR) Lab. It was designed with expression, speed, and modularity in mind.


It has an expressive architecture as it allows for a very configurable way to define models and optimization parameters without necessitating any additional code. This configuration also allows an easy switch from CPU to GPU mode and vice-versa with but a single flag change.


Caffe also boasts good performance benchmark numbers when it comes to speed. For instance, on a single NVIDIA K40 GPU, Caffe can process over 60 million images per day. Caffe also has a strong community, ranging from academic researchers as well as industrial research labs using Caffe across a heterogeneous application stack.



MXNet is a multi-language machine learning library. It offers two modes of computation:

Imperative mode: This mode exposes an interface much like regular NumPy like API. For example, to construct a tensor of zeros on both CPU and GPU using



Symbolic mode: This mode exposes a computation graph like TensorFlow. Though the imperative API is quite useful, one of its drawbacks is its rigidity.


All computations need to be known beforehand along with pre-defined data structures. Symbolic API aims to remove this limitation by allowing MXNet to work with symbols or variables instead of fixed data types.



The torch is a Lua based deep learning framework developed by Ronan Collobert, Clement Farabet, and Koray Kavukcuoglu. It was initially used by the CILVR Lab at New York University.


The Torch is powered by C/C++ libraries under its hood and also uses Compute Unified Device Architecture (CUDA) for its GPU interactions. It aims to be the fastest deep learning framework while also providing a simple C-like interface for rapid application development.



Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.


Some of the key features of Theano is its very tight integration with NumPy, making it almost native to a large number of Python developers. It also provides a very intuitive interface to using GPU or CPU. It has an efficient symbolic differentiation, allowing it to provide derivatives for functions with one or many inputs.


It is also numerically stable and has a dynamic code generation capability leading to faster expression evaluations. Theano is a good framework choice if you have advanced machine learning expertise and are looking for a low-level API for fine-grained control of your deep learning application.


Microsoft Cognitive Toolkit

Microsoft Cognitive Toolkit is also known as CNTK; it is the latest entry to the increasing set of deep learning frameworks. CNTK has two major functionalities that it supports:

  • Support for multiple features such as:
  • CPU/GPU for training and prediction


Both Windows and Linux operating systems

Efficient recurrent network training through batching techniques Data parallelization using one-bit quantized singular value decomposition (SVD)

Efficient modularization that separates:

  • Compute network
  • Execution engine
  • Learning algorithms
  • Model configuration



Keras is a deep learning framework that is probably the most different from every other framework described previously. Most of the described frameworks are low-level modules that directly interact with GPU using CUDA.


Keras, on the other hand, could be understood as a meta-framework that interacts with other frameworks such as Theano or TensorFlow to handle its GPU interactions or other system-level access management.


As such, it is highly flexible and very user-friendly, allowing developers to choose from a variety of underlying model implementations.


Keras community support is also gaining good momentum and, as of September 2017, TensorFlow team plans to integrate Keras as a subset of the TensorFlow project.


Their experimental results demonstrate that all the frameworks can utilize GPUs very efficiently and show performance gains over CPUs. However, there is still no clear winner among all of them, which suggests there are still improvements to be made across all of these frameworks.


Machine Learning

When you start talking about deep learning, the words data science, data analytics, and machine learning will come up a lot. In fact, we have already looked a little at machine learning.


A lot of people will get these terms confused, and most aren’t sure which one is which. In this blog, we will look more at the differences between all of these things so that you have a clear understanding of what they all are.


Machine learning is the practice of using algorithms to learn from data and forecast possible trends. The traditional software is combined with predictive and statistical analysis to help find the patterns and get the hidden information that was based upon the perceived data. Facebook is a great example of machine learning implementation.


Their machine learning algorithms collect information for each user. Based on a person’s previous behavior, their algorithm will predict the interests of the person and recommend notifications and articles in their news feed.


Since data science is a broad term that covers several disciplines, machine learning works as a part of data science. There are various techniques used in machine learning such as supervised clustering and regression. But, the data that is used in data science may not have come from a machine or any type of mechanical process.


The biggest difference is that data science covers a broader spectrum and doesn’t just focus on statistics and algorithms but will also look at the entire data processing system.


Data science can be viewed as the incorporation of several different parent disciplines including data engineering, software engineering, data analytics, machine learning, business analytics, predictive analytics, and more.


It includes the transformation, ingestion, collection, and retrieval of large quantities of data, which is referred to as Big Data. Data science structures big data, finding the best patterns, and then advising business people to make the changes that would work best for their needs. Machine learning and data analytics are two tools of the many that data sciences use.


A data analyst is someone who is able to do basic descriptive statistics, communicate data, and visualize data. They need to have a decent understanding of statistics and a good understanding of databases.


They need to be able to come up with new views and to perceive data as visualization. You could even go as far as to say that data analytics is the most basic level of data science.


Data science is a very broad term that encompasses data analytics and other several related disciplines. Data scientists are expected to predict what could happen in the future using past patterns.


A data analyst has to extract the important insights from different sources of data. A data scientist will create questions and the data analyst will find the answers to them.


Machine learning, deep learning, data science, and data analytics are only a few of the fasted growing areas of employment in the world right now. Having the right combination of skills and experience could help you get a great career in this trending arena.


AI, Deep Learning, and Machine Learning

Artificial intelligence looks at how to create machines that are capable of fulfilling tasks that would normally require human intelligence. This loose definition basically tells you that AI encompasses several fields of research, from expert systems to genetic algorithms, and helps to provide a scope of arguments over what it means to be AI.


Machine learning has recently found a lot of success in the field of AI research. It has allowed computers to pass up or come very close to matching up human performances in all areas that range from face recognition to language and speech recognition.


Machine learning uses the process of teaching a computer system how to perform a certain task instead of programming it to perform certain tasks in a step-by-step manner.


Once training has been finished, the system is able to come up with accurate predictions when it receives certain data.


This all may sound dry, but the predictions could end up answering if a fruit in a picture is an apple or banana, if a person is walking in front of a self-driving vehicle, if the word written in a sentence means a hotel reservation or a paperback, if an email message is a spam, or recognizing speech well enough to create captions on videos on YouTube.


Normally, machine learning is broken into supervised learning, which is where the computer is taught things by example from data that has been labeled, and unsupervised learning, which is when the computer groups together similar information and find the anomalies.


Deep learning is a single area within the machine learning process whose capabilities are different from traditional shallow machine learning in many important areas. This allows computers to be able to figure a whole host of complex issues that wasn’t able to be solved any other way.


A good example of a shallow machine learning task would be predicting that ice cream sales will be different depending on what the temperature is like outside. They make predictions with the use of only a couple of data features, and it is relatively straightforward. This can be carried out in a shallow technique, which is known as a linear regression with gradient descent.


The problem comes in the fact that there are a large number of problems within the world that don’t fit very well in such a simple model. One example of a complex real-world issue is being able to recognize handwritten numbers.


In order for this problem to be solved, the computer will have to cope with large variations in a way where data can be presented. Each digit that ranges from 0 to 9 can be written in a myriad of different ways.


Even the size and shape of the handwritten digits are able to be written in several different ways depending on the person writing them and in certain circumstances.


Coping with all of the variables of these different features and the large mess of interactions between each of them is where deep neural networks and deep learning start to be useful. Neural networks, which we will cover more completely in a later blog, are mathematical models whose structure is very loosely based on the brain.


Every neuron in the network is a function that will receive data through an input, it then transforms the data into a form that is more amenable, and it will then send it out through an output. These neurons can be viewed as layers.


Each of these networks has an input layer. This is where the starting data is fed in. They also have an output layer, which is what generates the last prediction.


When it comes to a deep neural network, there are several hidden layers of neurons that are located between these output and input layers, and each one of them feeds data into the other.


This is why you have the word deep in deep learning, as well as in deep neural networks. This references the number of hidden layers, which is normally more than three, located in the heart of these neural networks.


Neurons are believed to be activated once the sum of the values that are being inputted into the neuron has passed a certain threshold. The activation means is different depending on the layer it is in.


In the first hidden layer, activation could mean that the image of the handwritten number may contain a certain combo of pixels that look like horizontal lines at the top part of the number seven. Like this, that first hidden layer would detect a lot of the important curves and lines that would eventually mix together to create the final number.


A real neural network would probably have several hidden layers and several neurons in every layer. All of the small curves and lines found on the first layer would be fed in the second hidden layer, and then detect how they are combined to create a recognizable shape that creates a certain digit, like the entire loop of the number six.

Through this act of feeding data between the different layers, each layer will handle a higher-level of features.


How are these layers able to tell a computer the nature of a written number? All of the neuron layers provide a way for the network to create a rough hierarchy of different features that create the written number that is in question.


For example, if the input shows an array of values that represent the separate pixels in the photo of a written number, the following layer could show a combination of these pixels into shapes and lines, the following layer would combine all of the shapes into specific images such as the loops in an eight or a triangle in four, and so on.


When you slowly build up a picture of all of these features, a modern neural network is able to determine, with very good accuracy, the amount that is connected to the written number.


In a similar manner, different types of these deep neural networks are able to be trained to pick up of faces in a picture or change audio into written words.


The process of creating these increasingly complex hierarchies of features of written digits out of nothing except pixels is taught through the network. The computer is able to learn because how the network can alter the importance of the connections between each layer’s neurons.


Each of the links has an attached value that is known as the weight which will end up modifying the value that is sent out by a neuron as it travels between each layer. By changing up the value of the different weight, and the value that is known as the bias, there is a possibility to emphasize or diminish how important the links are between the network and neurons.


For example, when it comes to recognizing a number that was handwritten, these different weights can be changed to show the importance of a certain pixel group that creates a line, or a pair of lines that intersect that create a number seven.


 Neural Networks

Neural networks, which are sometimes referred to as Artificial Neural Networks, are a simulation of machine learning and human brain functionality problems. You should understand that neural networks don’t provide a solution for all of the problems that come up but instead provide the best results with several other techniques for various machine learning tasks.


The most common neural networks are classification and clustering, which could also be used for regression, but you can use better methods for that.


A neuron is a building unit for a neural network, which works like a human neuron. A typical neural network will use a sigmoid function. This is typically used because of the nature of being able to write out the derivative using f(x), which works great for minimizing error.


Even though it has found new fame, the idea of these neural networks isn’t actually new. In 1958, the psychologist, Frank Rosenblatt, tried to create “a machine which senses, recognizes, remembers, and responds like the human mind” and he named his creation Perceptron. He didn’t come up with this out of thin air.

Actually, his work was inspired by the works of Walter Pitts and Warren McCulloch from the 1940s.


Let’s look at what a Perceptron is. Dendrites are extensions that come off the nerve cell. These are what get the signals, and they then send them onto the cell body, which processes the stimulus and then will make a decision to either trigger a signal or not. When a cell chooses to trigger a signal, the cell body extension known as an axon will trigger a chemical transmission at its end to a different cell.


There is no need to feel like you have to memorize any of this. We aren’t actually studying neuroscience, so you only need a vague impression of how this works.


Perceptron looks similar to an actual neuron because they were inspired by the way actual neurons work. Keep in mind; it was only inspired by a neuron and in no way acts exactly like a real one. The way a Perceptron processes data is as such:


1. There are small circles on the left side of the Perceptron which are the “neurons” and they have x subscripts 1, 2,…, m that carries data input.

2. All of the inputs are multiplied by a weight, which is labeled using a subscript 1, 2, … , m, along with a long arrow called the synapse and travels to the big circle in the middle. So you will have w1 * x1, w2 * x2, w3 * x3, and so on.


3. After all of the inputs have been multiplied by the weight, you will sum them all up and add a bias that had been pre-determined.


4. The results are then pushed onto the right. You will then use the step function. All of these tells that if the number you get from step three is greater than or equal to zero, you will receive a one as your output, otherwise, if your result is lower than zero, the output will be zero.


5. You will get an output of either zero or one.


If you were to switch the bias and place it on the right in the activation function such as “sum(wx) ≥ -b” the –b would be known as a threshold value. With this, if the sum is higher than or equal to your threshold, then your activation trigger is one. Otherwise, it would come out to be zero. Pick the one that helps you understand this process because both of these representations are interchangeable.


Now, you have a pretty good understanding of how a Perceptron works. All it’s made up of is some mechanical multiplications, which then make summations, and then ultimately give you activation, and that will give you an output.


Just to make sure that you fully understand this, let’s have a look at a really simple example that is not really realistic. Let’s assume that you have found extreme motivation after you have read this book and you have to decide if you are going to study deep learning or not. You have three major factors that will help you make your decision:


  • 1. Will you be able to make more money once you master deep learning: 0 – No, 1 – Yes.
  • 2. Is the needed programming and mathematics simple: 0 – No, 1 – Yes.
  • 3. You are able to use deep learning immediately and not have to get an expensive GPU: 0 – No, 1 – Yes.


Our input variables will be x1, x2, and x3 for all of the factors, and we’ll give them each a binary value since they are all simple yes or no questions.

Let’s assume that you really love deep learning and you are now ready to work through your lifelong fear of programming and math. You also have some money put away to invest in the expensive Nvidia GPU that will train the deep learning model.


You can assume that both of these have the same importance because both of them can be compromised. But, you really want to be able to make extra money once you have spent all of the energy and time into learning about deep learning. Since you have a higher expectation of ROI, if you can’t make more moolah, you aren’t going to waste your time learning deep learning.


Now that you have a decent understanding of the decision preferences, we can assume that you have a 100 percent probability of making extra cash once you have learned deep learning because there’s plenty of demand for less supply. That means x1 = 1. Let’s assume that programming and math are extremely hard. That means x2 = 0.


Finally, let’s assume that you are going to need a powerful GPU such as a Titan X. That means x3 = 0. Okay, now that you have the inputs, you can initialize your weights. We’re going to try w1 = 8, w2 = 3, w3 = 3.


The higher the value for the weight, the bigger the influence it has with the input. Since the money you will make is more important, your decision for learning deep learning is, w1 is greater than w2, and w1 is greater than w3.


Let’s say that the value of the threshold is five, which equals the bias of negative five. We add everything together and add in the bias term. Since the threshold value is five, you will decide to learn deep learning if you are going to make more money.


Even if the math turns out to be easy and you aren’t going to have to buy a GPU, you won’t study deep learning unless you are able to make extra money later on.


Now, you have a decent understanding of bias and threshold. With a threshold as high as five, that means the main factor has to be satisfied in order for you to receive an output of one. Otherwise, you will receive a zero.


The fun part comes next: varying the threshold, bias, and weights will provide you with different possible decision-making models. With our example, if you lower your threshold from five to three, then you will get different scenarios where the output would be one.


Despite how well loved these Perceptrons were, the popularity faded quietly due to its limitations. Later on, people realized that multi-layer Perceptrons were able to learn the logic of an XOR gate, but this requires the use of back propagation so that the network can learn from its own problems. Every single deep learning neural networks are data-driven.


If you are looking at a model and the output it has is different from the desired output, you will have to have a way to back propagate the error information throughout the network in order to let the weight know they need to adjust and fix themselves by a certain amounts.


This is so that, the real outputs from the model will start getting closer in a gradual way to the desired output with each round of testing.


As it turned out when it comes to the more complicated tasks that involved outputs that couldn’t be shown with a linear combination of inputs, meaning the outputs aren’t linearly separable or non-linear, the step function will not work because the back propagation won’t be supported. This requires that your activation function should have meaningful derivatives.


Here’s just a bit of calculus: a step function works as a linear activation function where your derivative comes out to 0 for each of the inputs except for the actual point of 0.


At the point of 0, your derivative is going to be undefined because the function becomes discontinuous at this point. Even though this may be an easy and simple activation function, it’s not able to handle the more complicated tasks.


Sigmoid function: f(x) = 1/1+e^-x


Perceptrons aren’t stable when it comes to being a neural network relationship candidate. Look at it like this: this person has major bipolar issues. There comes a day (if z < 0), they are all “down” and “quiet” and doesn’t give any response.


The next day (if z ≥ 0), they are all of a sudden “lively” and “talkative” and is talking nonstop. A huge change, isn’t it? Their mood doesn’t have any transition, and you’re not sure if it is going to go up or down. This is a step function.


Just a bit of a switch in each of the weights within the input of the network may cause a neuron to flip from zero to one, which could end up affecting the behavior of the hidden layer, and this would cause a problem for the outcome.


It’s important that you have a learning algorithm that improves the network because it slows the change of weights without any sudden jumps. If you aren’t able to use step functions to slowly change up the weight values, then you shouldn’t use it.


We are now going to say goodbye to the Perceptron with a step function. A new partner to use in your neural network is the sigmoid neuron. This is done by using the sigmoid function, which is written above.


The only thing that is going to change is the activation function, and all the other stuff that you have learned up to this point about a neural network is going to work the same for the new neuron type.


If the function looks strange or a little abstract, you don’t need to focus a lot on the details such as the Euler’s number ‘e’ or the way a person was able to create such a crazy function. For the people that aren’t all that math savvy, the only thing that you really need to worry about is that you need to know the curve, and then its derivative.


1. A sigmoid function will produce results close to a step function in that the results will be either zero or one. The curve will cross at the 0.5 points at z=0, which you are then able to set function rules. This could be if the neuron’s output is more or the same as 0.5, the output would be zero, and if the output ends up being less than 0.5, the output would be zero.


2. A sigmoid function’s curve won’t have a jerk. The curve will be smooth with a simple derivative of σ(z) * (1-σ(z)). This is differentiable in all areas of the curve.


3. If z ends up being negative, the output is going to be around zero. If z ends up being positive, the output will end up being around one. But when z=0 and z aren’t too large or too small, you will get a relatively more deviation as the z changes.


A sigmoid function will introduce you to a non-linearity that will be added to the neural network. Non-linear only means that the output that you end up receiving isn’t going to be able to be shown as a linear combination.


These non-linear functions will give you a new representation of your original data, and it will end up allowing for non-linear boundaries, like XOR. When you have XOR, if two of these neurons were placed in your hidden layers, you could change your original 2D figure to a 3D figure in a different area.


When it comes to linearity and non-linearity, things can become quite confusing. That’s why, if you are serious about learning deep learning, it’s important that you do plenty of studying on these subjects.


Hopefully, though, you have a bit of a sense as to the reason why non-linear activation functions are important, but if you don’t quite understand, it’s okay. Allow yourself some time to take all of the information.


For a neural network to learn, you have to adjust the weights to get rid of most of the errors, as you have learned. This can be done by performing backpropagation of the error.


When it comes to a simple neuron that uses the Sigmoid function as its activation function, you can demonstrate the error as we did below. We can consider that in a general case, the weight is termed as W and the inputs as X.


With this equation, the weight adjustment can be generalized, and you would have seen that this will only require the information from the other neuron levels. This is why this is a robust mechanism for learning, and it is known as the back propagation algorithm.


To practice this, we can write out a simple JavaScript application that uses two images and will apply a filter to a specific image. All you will need is an image you want to change and fill in its filename where it says to in the code.

“ import Jimp = require(“jimp”); Import Promise from “ts-promist”; Const synaptic = require(“synaptic”); Const _ = require(“lodash”); Const Neuron = synaptic.Neuron,

Layer = synaptic.Layer, Network = synaptic.Network, Trainer = synaptic.Trainer, Architect = synaptic.Architect; Function getImgData(filename) {

Return new Promise((resolve, reject) => { Jimp.read(filename).then((image) => {

Let inputSet: any = [];

Image.scan(0, 0, image.bitmap.width, image.bitmap.height, function (x,

y, idx) {

Var red = image.bitmap.data[idx + 0]; Var green = image.bitmap.data[idx + 1];

inputSet.push([re, green]);



}).catch(function (err) {





Const myPerceptron = new Archietect.Perceptron(4, 5); Const trainer = new Trainer(myPerceptron); Const traininSet: any = [];

getImgData(‘ imagefilename.jpg’). then((inputs: any) => {

getImageData(‘imagefilename.jpg’).then((outputs: any) => {

for (let i=0; I < inputs.length; i++) {


input: _.map(inputs[i], (val: any) => val/255),

output: _.map(outputs[i], (val: any) => val/255)



Trainer.train(trainingSet, {


Interations: 200,

Error: .005,

Shuffle: true,

Log: 10,

Cost: Trainer.cost.CROSS_ENTROPY


Jimp.read(‘yours.jpg’).then((image) => {

Image.scan(0, 0, image.bitmap.width, image.bitmap.height, (x, y, idx)

=> {

Var red = image.bitmap.data[idx + 0]; Var green = image.bitmap.data [idx + 1];

Var out – myPerceptron.activate([red/255, green/255);

Image.bitmap.data[idx + 0] = _.round(out[0] * 255);

Image.bitmap.data[idx + 1] = _.round(out[1] * 255);




}).catch(function (err) {




}); ”


ROC Curve Analysis

Data science and statistics both need the ROC analysis curve. It shows the performance of a model or test by looking at the total sensitivity versus its fall-out rate.


This plays a crucial role when it comes to figuring out a model’s viability. However, like a lot of technological leaps, this was created because of war. During WWII, they used it to detect enemy aircraft.


After that, it moved into several other fields. It has been used to detect the similarities of bird songs, the accuracy of tests, the response of neurons, and more.


When a machine learning model is run, you will receive inaccurate predictions. Some of the inaccuracy is due to the fact that it needed to be labeled, say true, but was labeled false. And others need to be false and not true.


What are the odds that the prediction is going to be correct? Since statistics and predictions are just supported guesses, it becomes very important that you are right. With a ROC curve, you are able to see how right the predictions are and using the two parables, figure out where to place the threshold.


The threshold is where you choose if the binary classification is false or true, negative or positive. It will also make what your Y and X variables are. As your parables reach each the other, your curve will end up losing the space beneath it.


This shows you that the model is less accurate no matter where your threshold is placed. When it comes to modeling most algorithms, the ROC curve is the first test performed. It will detect problems very early by letting you know if your model is accurate.


Bayes Theorem

This is one of the more popular ones that most computer-minded people need to understand. You can find it being discussed in lots of books. The best thing about the Bayes theorem is that it simplifies complex concepts. It provides a lot of information about statistics on just a few variables.


It works well with conditional probability, which means that if this happens, it will play a role in the resulting action. It will allow you to predict the odds of your hypothesis when you give it certain points of data. You can use Bayes to look at the odds of somebody having cancer, based on age, or if spam emails are based on the wording of the message.


The theorem helps lower your uncertainty. This was used in WWII to figure out the locations of U-boats and predict how the Enigma machine was created to translate codes in German.


K-Nearest Neighbor Algorithm

This is one of the easiest algorithms to learn and use, so much so that Wikipedia refers to it as the “lazy algorithm.” The concept of the algorithm is fewer statistics based and more reasonable deduction. Basically, it tries to identify the groups that are closest to each other. When k-NN is used on a two-dimensional model, it will rely on Euclidian distance.


This only happens if you are working with a one norm distance as it relates to square streets, and those cars can travel in a single direction at a time. The point I’m making is that the models and objects in this rely on two dimensions, just like the classic xy graph.


k-NN tries to identify groups that are situated around a certain number of points. K is the specified number of points. There are certain ways to figure out how big your k needs to be because it is an inputted variable that the data science system or user has to pick.


This model is perfect for feature clustering, basic market segmentation, and finding groups that are among specific data points. The majority of programming languages will let you implement in a couple of code lines.


Bagging or Bootstrap Aggregating

Bagging will involve making several models of one algorithm like a decision tree. Each one of them will be trained on the different bootstrap sample. Since this bootstrapping will involve sampling with replacement, some of your data won’t be used in all of the trees.


The decisions trees that are made are created with different samples, which will help to solve the problem of sample size overfitting. Decision trees that are created in this way will help lower the total error since the variance will continue to lower with every tree that is added, without increasing the bias.


A random forest is a bag of decision trees that use subspace sampling. There is only one selection of the trees features that are considered at the split of each node, which removes the correlation of the trees in your forest.


These random forests also have their own built-in validation tool. Since there is only a percentage of this data that gets used for every model, the error of the performance can be figured out using only 37% of the sample that was left by the models.


This was only a basic rundown of some statistical properties that are helpful in data science. While some data science teams will only run algorithms in R and Python libraries, it’s still important to understand these small areas of data science. They will make easier abstraction and manipulation easier.


Deep Learning Applications

As you have learned so far, deep learning is changing how everybody looks at technology. A lot of excitement swirls around artificial intelligence as well as its branches of deep learning and machine learning.


With the huge computational power that machines have, they are now able to translate speech and recognize objects in real time. Finally, artificial intelligence is getting smart.


It is believed that there are many deep learning applications that will affect your life in the very near future. In fact, they are probably already making a huge impact. In just the next five to ten years, deep learning development languages, tools, and libraries will end up being the standard components of all software development toolkits.


Let’s look at some of the top deep learning applications that will end up ruling our world in 2018 and beyond.


Self-Driving Cars

Companies that work to build driver assistance services for cars, and full-blown self-driving cars just like Google’s, have to teach the computer system how to use all, or at least, the key parts of driving by using digital sensor system instead of needing a human’s sense.


In order to do this, companies will have to start by training algorithms to use a lot of data. This can be looked at as a child learning through replication and experiences. All of these services could end up providing some unexpected business models for several companies.



Skin or breast cancer diagnostics? Monitoring and mobile apps? Maybe a personalized and predictive medicine on the basis of Biobank data? Artificial intelligence is reshaping healthcare, life sciences, and medicine as an industry.


AI type innovations are advancing the future of population health and precision medicine management in ways that nobody would have ever believed. Computer-aided diagnosis, decision support tools, quantitative imaging, and computer-aided detection will all play very large roles in the future.


Voice-Activated Assistants and Voice Search

This is probably one the most popular uses for deep learning. All of the big tech giants have made large investments in this area. You can find voice-activated assistants on almost every smartphone.


Siri has been available for use since October 2011. The assistant for Android, Google Now, was launched just a year after Siri. Microsoft has introduced the newest assistant in the form of Cortana.


Automatically Placing Audio in Silent Movies

When it comes to this, the system synthesizes the sounds that are similar to the silent movies. This system was trained with a thousand examples from different videos with sounds of a drumstick hitting different types of surfaces and coming up with different types of sounds.


Deep learning models associate the frames of the video with a pre-recorded sound database so that it can choose a sound to play and matches up the best with the things going on in the scene.


They use a Turing Test to evaluate the system such as a setup where humans will have to figure out if the video has real or fake sounds. This uses applications of LSTM as well as RNN.


Automatic Machine Translation

This process is where a given word, sentence, or phrase is said in one language and then automatically translated to another language. This technology has been around for a while, but deep learning has gotten the best results in two areas:

  • Image translations
  • Text translations


These text translations can be done without the need for pre-processing the sequence, which allows the algorithm to be able to learn the dependencies between the word and the new language mapping.


Automatic Text Generation

This task is one of the most interesting. This is where a body of text has been learned, and new text is created either character-by-character or word-by-word.


This model can learn how to capture text styles, forms of sentences, punctuations, and spelling in the body. Large recurrent neural networks are helpful when it comes to learning the relationship between different items in an input string sequence, and it will then generate text.


Automatic Handwriting Generation

This task has provided a corpus of examples of handwriting and generates new handwriting for a certain phrase or word. The handwriting is given as coordinate sequences used by a pen once the samples have been created. From the body, the connection of the letters and the pen movement is learned and the new examples are able to be created ad hoc.


Internet Search

Chances are when you hear the word search; your first thought is Google. But there are actually several other search engines out there such as duckduckgo, AOL, Ask, Bing, and Yahoo.


Every search engine out there uses some form of a data science algorithm to provide their users the best results for their search query in less than a second. Think about this. Google process over 20 petabytes of data every single day. If there wasn’t any data science, Google would not be as good as it is today.


Image Recognition

Another big area of use for deep learning is with image recognition. This tool is used to identify and recognize objects and people in images and to better understand the context and content. This tool has already been used in many sectors such as tourism, retail, social media, gaming, and so on.


The task will require the objects’ classification that is in a certain picture as one of a set of objects that it already knew. A complex version of this would object detection which involves identifying more than one object in a scene of photo and placing a box around it.


Automatic Image Caption Generation

This task is where a certain image is provided and the system has to come up with a caption that describes what is in the photo. In 2014, a boom of deep learning algorithms achieved pretty big results when it came to this problem. It leveraged the work from top models in order to classify and detect objects in pictures.


After an object has been detected in a photo and it has generated the labels for the object, you will be able to see that the following step would be to change those labels into a coherent descriptive sentence.


Typically, this system will involve using large convolutional neural networks in order to detect the object in a photo and will then use an RNN, such as an LSTM, to change the label into something coherent.


Automatic Colorization

This is the process of adding color to photos that were originally black and white. Deep learning is able to use the objects and the content of the photo to color these images, a lot like how a human operator would approach something like this.


The capability leveraged the large convolutional neural networks and great quality that is created for ImageNet and co-opted to help solve the issue of this task. Typically, this approach will mean that there are a large convolutional neural network and many layers that will provide you with the colored image.


This was traditionally performed by hand by humans because of the difficulty of the task.



Advertising, another big area that has been changed by the advent of deep learning, has been used by advertisers and publishers to up the relevancy of ads and to boost their ROI of their campaigns.


For example, deep learning helps publishers and ad networks to leverage the content so that they can create precisely targeted display advertising, real-time bidding for their ads, data-driven predictive advertising, and many more.


Recommender Systems

Think about the suggestions Amazon gives you. They help you find relevant products from billions of others, but that also improve your experience. There are a lot of companies out there that use this system to promote suggestions that align with their user’s internet.


The giants of the internet like IMDB, LinkedIn, Netflix, Google Play, Twitter, Amazon, and several more use this type of system to make their user’s experience better. The recommendations you see are based on your previous searches.


Predicting Earthquakes

There was a Harvard scientist that figured out how to use deep learning to teach a computer system to perform viscoelastic computations. These are the computations that are used to predict earthquakes.


Until they figured this out, these types of computations were computer intensive, but the deep learning application helped improve calculations by 50,000%. When we are talking about earthquake calculation, timing plays a large and important role. This improvement may just be able to save a life.


Neural Networks for Brain Cancer Detection

A French research team found that finding invasive brain cancer cells while surgery was hard, mainly because of the lighting in the OR. They discovered that when they used neural networks along with Raman spectroscopy during surgery, it allowed them to detect the cancer cells more easily and lowered leftover cancer.


Actually, this is only a single piece of many over the last couple of months that have matched the workings of advanced classification and recognition with several kinds of cancers and screening tools.


Neural Networks in Finances

Futures markets have been extremely successful since they were created in both developing and developed countries over the last few decades. The reason for it succeeding is due to the leverage futures provide for people who are participants in the market.


They examined the trading strategy, which did better because of the leverage by using cost-of-carry relationship and CAPM.


The team would then apply the technical trading rules that had been created from spot market prices, on futures market prices that used a hedge ratio based on CAPM. The historical price data of 20 stocks from all of the 10 markets are a part of the analysis.


Automatic Game Playing

This task involves a model of learning how to play a computer-based game using only the pixels that are on the screen. This is a pretty hard task in the realm of deep reinforcement models, which has also been a breakthrough for DeepMind, which was part of Google. Google DeepMind’s AlphaGo has expanded and culminated in this.


Activision-Blizzard, Nintendo, Sony, Zynga, and EA Sports have been the leaders in the gaming world and brought it to the next level through data science.

Games are now being created by using machine learning algorithms which are able to upgrade and improve playing as the player moves through the game. When you are playing a motion game, the computer analyzes the previous moves to change the way the game performs.


GPU Computing

Deep Neural Networks (DNNs) are structured in a very uniform manner, such that, at each layer of a network of thousands of identical artificial neurons perform the same computation. Therefore, DNN's architecture fits quite well with the kinds of computation that a GPU can efficiently perform.


GPU has additional advantages over CPU; these include having more computational units and having a higher bandwidth to retrieve from memory.

Furthermore, in many deep learning applications that require a lot of computational effort, GPU graphics-specific capabilities can be exploited to further speed up calculations.


GPGPU computing

There are several reasons that have led to deep learning to be developed and placed at the center of attention in the field of machine learning only in recent decades.


One reason, perhaps the main one, is surely represented by the progress in hardware, with the availability of new processors, such as graphics processing units (GPUs), which have greatly reduced the time needed for training networks, lowering them to 10/20 times.


In fact, since the connections between the individual neurons have a weight numerically estimated, and that networks learn by calibrating the weights properly, we understand how the network's complexity requires a huge increase, in computing power, required for graphics processors used in the experiments.


GPGPU history

The general purpose computing on the graphics processing unit (GPGPU) recognizes the trend to employ GPU technology for non-graphic applications. Until 2006, the graphics API OpenGL and DirectX standards were the only ways to program with the GPU.


Any attempt to execute arbitrary calculations on the GPU was subject to the programming restrictions of those APIs.


The GPUs were designed to produce a color for each pixel on the screen using programmable arithmetic units called pixel shaders. The programmers realized that if the inputs were numerical data, with a different meaning from the pixel colors, then they could program the pixel shader to perform arbitrary computations.


The GPU was deceived by showing general tasks such as rendering tasks; this deception was intelligent, but also very convoluted.


There were memory limitations because the programs could only receive a handful of input color and texture units as input data. It was almost impossible to predict how a GPU would handle the floating-point data (if it was able to process it) so many scientific calculations could not use the GPU.


Anyone who wanted to resolve a numerical problem would have to learn OpenGL or DirectX, the only ways to communicate with the GPU.


The CUDA architecture

In 2006, NVIDIA was presented as the first GPU to support DirectX 10; the GeForce 8800GTX was also the first GPU to use the CUDA architecture. This architecture included several new components designed specifically for GPU computing and aimed to remove the limitations that prevented them that previous GPUs were used for non-graphical calculations.


In fact, the execution units on the GPU could read and write arbitrary memory as well as access a cache maintained in software called shared memory. These architectural features were added to make a CUDA GPU that also excelled in general purpose calculations as well as in traditional graphics tasks.


The following figure summarizes the division of space between the various components of a graphics processing unit (GPU) and a central processing unit (CPU). As you can see, a GPU devotes more transistors to data processing; it is a highly parallel, multithreaded, and many core processor:


CPU versus GPU architecture

Almost all the space on the GPU chip is dedicated to the ALU, apart from cache and control, making it suitable for repetitive calculations on large amounts of data.


The GPU accesses a local memory and is connected to the system, that is, the CPU via a bus-- currently, the Peripheral Component Interconnect Express (PCI Express).

The graphics chip consists of a series of multiprocessors, the Streaming


Multiprocessor (SM).

The number of these multiprocessors depends on the specific characteristics and the performance class of each GPU. Each multiprocessor is in turn formed by stream processors (or cores). Each of these processors can perform basic arithmetic operations on integer or floating-point numbers in single and double precision.


GPU programming model

At this point, it is necessary to introduce some basic concepts to understand the CUDA programming model. The first distinction is between host and device.


The code executed in the host side is the part of code executed on the CPU, and this will also include the RAM and the hard disk.

However, the code executed on the device is automatically loaded on the graphics card and run on the latter. Another important concept is the kernel; it stands for a function performed on the device and launched from the host.


The code defined in the kernel will be performed in parallel by an array of threads. The following figure summarizes how the GPU programming model works:


The running program will have the source code to run on CPU and code to run on GPU

  • CPU and GPU have separated memories
  • The data is transferred from CPU to GPU to be computed
  • The data output from GPU computation is copied back to CPU memory
  • GPU programming model


TensorFlow GPU set up

The NVIDIA deep learning SDK offers powerful tools and libraries for the development of deep learning frameworks such as Caffe, CNTK, TensorFlow, Theano, and Torch. The NVIDIA deep learning SDK provides powerful tools and libraries for designing and deploying GPU-accelerated deep learning applications.


It includes libraries for deep learning primitives, inference, video analytics, linear algebra, sparse matrices, and multi-GPU communications. The current implementation supports the following SDKs:


Deep learning primitives: (https://developer.nvidia.com/cudnn) High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations.


Deep learning inference engine: (https://developer.nvidia.com/tensorrt) High- performance deep learning inference runtime for production deployment.


Deep learning for video analytics: (https://developer.nvidia.com/deepstream-sdk) High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference.


Linear algebra: (https://developer.nvidia.com/cublas) GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only


BLAS libraries. The XLA (https://www.tensorflow.org/performance/xla/) is a domain-specific compiler for linear algebra that optimizes TensorFlow computations. Although this experimental (that is, under active development), however, the possible results are improvements in speed, memory usage, and portability on the server and mobile platforms.


Sparse matrix operations: (https://developer.nvidia.com/cusparse) GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing.


Multi-GPU communication: (https://github.com/NVIDIA/nccl) Collective communication routines, such as all-gather, reduce, and broadcast that accelerates multi-GPU deep learning training on up to eight GPUs.


However, the deep learning SDK requires CUDA toolkit (https://develo per.nvidia.com/cuda-toolkit), which offers a comprehensive development environment for building new GPU-accelerated deep learning algorithms, and dramatically increasing the performance of existing applications.


To use TensorFlow with NVIDIA GPUs, the first step is to install the CUDA toolkit.

See more at: https://developer.nvidia.com/cuda-downloads.

Once the CUDA toolkit is installed, you must download the cuDNN v5.1 library from https://developer.nvidia.com/cudnn for Linux.


A more detailed installation including TensorFlow and Bazel for GPU computation using cuDNN, please refer to this URL http://www.n vidia.com/object/gpu-accelerated-applications-tensorflow-installation.html.


The cuDNN is a library that helps accelerate deep learning frameworks, such as TensorFlow or Theano. Here's a brief explanation from the NVIDIA website.


The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for DNNs. The cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The cuDNN is part of the NVIDIA deep learning SDK.


Before installing you'll need to register for NVIDIA's Accelerated Computing Developer Program. Once registered, login and download CudNN 5.1 to your local computer.


During the writing of this book, the version of cuDNN was 5.1 and released on Jan 20, 2017, for CUDA 8.0. For more details at https://developer. nvidia.com/rdp/cudnn-download.


As shown in the preceding figure, you will have to select your platform/OS type. The following installation is for Linux. Now, once downloaded, uncompress the files and copy them into the CUDA toolkit directory (assumed here to be in


$ sudo tar -xvf cudnn-8.0-linux-x64-v5.1-rc.tgz -C /usr/local


Update TensorFlow

We're assuming you'll be using TensorFlow for building your DNN models.

Simply update TensorFlow, via pip with the upgrade flag.

Here, we suppose you're currently using TensorFlow 1.0.1:

pip install - upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.


Now you should have everything you need to run a model using your GPU.

For other versions Operating System, Python version or CPU only vs. GPU support, you should refer to this URL:

https://www.tensorflow.org/install/install_linux#the_url_of_the_tensorflow_python_packa ge


TensorFlow GPU management

In TensorFlow, the supported devices are represented as strings. For example:

/cpu:0: The CPU of your machine

/gpu:0: The GPU of your machine, if you have one

/gpu:1: The second GPU of your machine, and so on

The execution flow gives priority when an operation is assigned to a GPU device.


Programming example

To use a GPU in your TensorFlow program, just type the following:

with tf.device("/gpu:0"):


Followed by the setup operations. This line of code will create a new context manager, telling TensorFlow to perform those actions on the GPU.


Let's consider the following example, in which we want to execute the following sum of two matrices, An + Bn.

Define the basic imports:

import numpy as np

import tensorflow as tf

import datetime


We can configure a program to find out which devices your operations and tensors are assigned. To realize this, we'll create a session with the following

log_device_placement parameter set to True:

log_device_placement = True

Then we fix the n parameter, that is, the number of multiplication to perform:


Then we build two random large matrices. We use the NumPy rand function to perform this operation:

A = np.random.rand(10000, 10000).astype('float32')

B = np.random.rand(10000, 10000).astype('float32')

A and B will be respectively, of size 10000x10000.

The following array will be used to store results:

c1 = []

c2 = []

Here, we define the kernel matrix multiplication function that will be performed by the GPU:

def matpow(M, n):

if n < 1:

return M


return tf.matmul(M, matpow(M, n-1))


As previously explained, we must configure the GPU and the GPU with the operations to perform.

The GPU will compute the An and Bn operations and store results in c1:

with tf.device('/gpu:0'):

a = tf.placeholder(tf.float32, [10000, 10000])

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))

c1.append(matpow(b, n))

In case if the above code does not work use

/job:localhost/replica:0/task:0/cpu:0 as the GPU device (that is, will

be executed using CPU).


The addition of all elements in c1, that is, An + Bn, is performed by the CPU, so we define it as follows:

with tf.device('/cpu:0'):

sum = tf.add_n(c1)

The datetime class permits to evaluate the computational time:

t1_1 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto\

(log_device_placement=log_device_placement)) as sess:

sess.run(sum, {a:A, b:B})

t2_1 = datetime.datetime.now()

Computational time is then displayed using:

print("GPU computation time: " + str(t2_1-t1_1))

I am using a GeForce 840M graphic card, the results are as follows:

GPU computation time: 0:00:13.816644

Source code for GPU computation


Here is the full code for the previous example:

import numpy as np

import tensorflow as tf

import datetime

log_device_placement = True

n = 10

A = np.random.rand(10000, 10000).astype('float32')

B = np.random.rand(10000, 10000).astype('float32')

c1 = []

c2 = []

def matpow(M, n):

if n < 1: #Abstract cases where n < 1

return M


return tf.matmul(M, matpow(M, n-1))

with tf.device('/gpu:0'):

a = tf.placeholder(tf.float32, [10000, 10000])

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))

c1.append(matpow(b, n))

with tf.device('/cpu:0'):

sum = tf.add_n(c1) #Addition of all elements in c1, i.e. A^n + B^n

t1_1 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto\

(log_device_placement=log_device_placement)) as sess:

sess.run(sum, {a:A, b:B})

t2_1 = datetime.datetime.now()

For the following case if the preceding code does not work or if there’s no GPU support in your device, use

/job:localhost/replica:0/task:0/cpu:0 as the CPU device.


GPU memory management

In some cases, it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two configuration options on the session to control this.


The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations, it starts out allocating very little memory, and as sessions get to run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.


Note that we do not release memory since that can lead to even worse memory fragmentation. To turn this option on, set the option in ConfigProto by:

config = tf.ConfigProto()

config.gpu_options.allow_growth = True

session = tf.Session(config=config, ...)


The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated.


For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU by:

config = tf.ConfigProto()

config.gpu_options.per_process_gpu_memory_fraction = 0.4 session = tf.Session(config=config, ...)

This is useful if you want to truly bind the amount of GPU memory available to the TensorFlow process.


Assigning a single GPU on a multi-GPU system

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. If you would like to run on a different GPU, you will need to specify the preference explicitly.

For example, we can try to change the GPU assignation in the previous code:

with tf.device('/gpu:1'):

a = tf.placeholder(tf.float32, [10000, 10000])

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))

c1.append(matpow(b, n))

In this way, we are telling GPU to execute the kernel function.


If the device we have specified does not exist (as in my case), you will get the following error message on the console (or terminal):

InvalidArgumentError :

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Placeho

[[Node: Placeholder_1 = Placeholder[dtype=DT_FLOAT, shape=[100,100], _device="/device:G


If you would like TensorFlow to automatically choose an existing and supported device to run the operations in case the specified one doesn't exist, you can set allow_soft_placement to True in the configuration option when creating the session.


Again, we fix /gpu:1 for the following node:

with tf.device('/gpu:1'):

a = tf.placeholder(tf.float32, [10000, 10000])

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))

c1.append(matpow(b, n))

Then we build a session with the following allow_soft_placement parameter set to


with tf.Session(config=tf.ConfigProto\



as sess:


In this way, when running the session InvalidArgumentError will not be displayed, but instead, a correct result will be displayed, in this case, with a slight delay:

GPU computation time: 0:00:15.006644

Source code for GPU with soft placement


We report, just for better understanding, the complete source code:

import numpy as np

import tensorflow as tf

import datetime

log_device_placement = True

n = 10

A = np.random.rand(10000, 10000).astype('float32')

B = np.random.rand(10000, 10000).astype('float32')

c1 = []

def matpow(M, n):

if n < 1: #Abstract cases where n < 1

return M


return tf.matmul(M, matpow(M, n-1))

with tf.device('/gpu:1'):

a = tf.placeholder(tf.float32, [10000, 10000])

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))

c1.append(matpow(b, n))

with tf.device('/cpu:0'):

sum = tf.add_n(c1)

t1_1 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto\



as sess:

sess.run(sum, {a:A, b:B})

t2_1 = datetime.datetime.now()


Using multiple GPUs

If you would like to run TensorFlow on multiple GPUs, you can construct your model assigning a specific chunk of code to a GPU. For example, having two GPUs, we can split the previous code in this way, assigning the first matrix computation to the first GPU as follows:

with tf.device('/gpu:0'):

a = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))

The second matrix computation to the second GPU as follows:

with tf.device('/gpu:1'):

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(b, n))

Finally, your CPU will manage the results; also note that we used the shared c1 array to collect them:

with tf.device('/cpu:0'):

sum = tf.add_n(c1)


Source code for multiple GPUs management

The complete source code is fully listed here:

import numpy as np

import tensorflow as tf

import datetime

log_device_placement = True

n = 10

A = np.random.rand(10000, 10000).astype('float32')

B = np.random.rand(10000, 10000).astype('float32')

c1 = []

def matpow(M, n):

if n < 1: #Abstract cases where n < 1

return M


return tf.matmul(M, matpow(M, n-1))


with tf.device('/gpu:0'):

a = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(a, n))


with tf.device('/gpu:1'):

b = tf.placeholder(tf.float32, [10000, 10000])

c1.append(matpow(b, n))

with tf.device('/cpu:0'):

sum = tf.add_n(c1)

t1_1 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto\



as sess:

sess.run(sum, {a:A, b:B})

t2_1 = datetime.datetime.now()