What is Deep Learning

What is Deep Learning

What is Deep Learning

Google uses deep learning for voice recognition algorithms. Amazon and Netflix use it to decide what it is that you are interested in watching or buying next.

 

MIT researchers use it to predict the future. This very well established and still growing industry is always looking for a chance to sell these tools about how revolutionary it all is. But what is it exactly? Is this just some other fad that is used to push the old-fashioned AI on us by using some sexy new name?

 

It would probably be helpful to look at deep learning as the cutting-edge of the cutting-edge. Machine learning uses some of the main ideas of AI and focuses on figuring out some real-world problems using neural networks that are used to mimic a human brain’s decision-making process.

 

Deep learning likes to focus more on narrower subsets of machine learning tools and techniques, and then it will apply them to figure out almost any problem that needs thinking, whether artificial or human.

 

If you are just starting in the deep learning field, or if you have a bit of experience with neural networks a long time ago, you will probably find yourself a bit confused. A lot of people have been baffled by this, especially those who learned about neural networks in the 1990s and early 2000s.

 

How it Works

Basically, deep learning involves providing a computer system with a whole lot of data, which it will then use to make decisions about different types of data. The data will then be fed throughout a neural network, which is the same as machine learning.

 

The networks are logical constructions which ask a bunch of binary true and false questions. They also extract a numerical value of the entire data collection which runs throughout them, and they classify them according to the answers that they get.

 

Since deep learning is mainly focused on coming up with these networks, they will then become what is known as a deep neural network. This is a logic network of complexity that is needed to handle the classifying datasets as large as Twitter’s firehose of tweets or Google’s image library.

 

When you have data sets that are as comprehensive as these and logical networks that are created sophisticated enough to handle classification, it will end up being trivial for the computer to be able to take an image and tell you with a high probability of accuracy what a human would see it as.

 

Pictures is a perfect example of how all of this works because they typically have a lot of different elements, and they are easy for us to grasp the way a computer, which has a one-track calculation mind, can figure out how to interpret them just like we would.

 

However, the great thing about deep learning is that it can be applied to all types of data, written words, speech, video, audio, and machine signals so that it can produce conclusions that appear to have come from a human, extremely fast humans. We’re going to take a look at a practical example.

 

Take, for example, a system that was created to automatically record and report the number of vehicles of a certain make and model that travel across a public road.

 

First, they would access a large database of the different car types which include their engine sound, shape, and size. This could be done in a manual fashion or, when it comes to advance use cases, it could be automatically compiled using a system if it has been programmed to scour the internet and take in all of the data that it discovers.

 

Then, it would have to take all of the data that it needs to process. This would be real-world data that contains the insights, which for this example would need to capture roadside cameras and microphones.

 

By comparing the data from all of the sensors with the data that it has learned, it is then able to classify, with a probable accuracy, the passing vehicle, and their make and model.

 

At this point, it is pretty much straightforward. The word deep comes in because the system, as time passes and it is able to gain more experience, is able to increase the probability that it will classify information correctly by training itself on all of the new data that it receives. Basically, it is able to learn from its own mistakes, just like humans.

 

For example, it could end up incorrectly deciding that a certain vehicle is a certain type of make and model which was based upon their similar engine noise and size.

 

It would overlook other differentiators that it thought would have a low probability of being important in making this particular decision. Since it has now learned that this differentiator is actually important in identifying two different vehicles, it will be able to improve the odds of it correctly picking the vehicle next time.

 

Deep learning hardware 

CPU cores

Most deep learning applications and libraries use a single core CPU unless they are used within a parallelization framework like Message-Passing Interface (MPI), MapReduce, or Spark.

 

For example, CaffeOnSpark by the team at Yahoo! uses Spark with Caffe for parallelizing network training across multiple GPUs and CPUs. In most normal settings in a single box, one CPU core is enough for deep learning application development.

 

CPU cache size

CPU cache size is an important CPU component that is used for high-speed computations. A CPU cache is often organized as a hierarchy of cache layers, from L1 to L4 L1, and L2 being smaller and faster cache layers as opposed to the larger and slower layers L3 and L4.

 

In an ideal setting, every data needed by the application resides in caches and hence no read is required from RAM, thereby making the overall operation faster.

 

However, this is hardly the scenario for most of the deep learning applications. For example, for a typical ImageNet experiment with a batch size of 128, we need more than 85MB of CPU cache to store all information for one mini batch.

 

Since such datasets are not small enough to be cache-only, a RAM read cannot be avoided. Hence modern day CPU cache sizes have little to no impact on the performance of deep learning applications.

 

RAM size

As we saw previously in this section, most of the deep learning applications read directly from RAM instead of CPU caches. Hence, it is often advisable to keep the CPU RAM almost as large, if not larger, than GPU RAM.

 

The size of the GPU RAM depends on the size of your deep learning model. For example, ImageNet based deep learnings models have a large number of parameters taking 4 GB to 5 GB of space, hence a GPU with at least 6 GB of RAM would be an ideal fit for such applications.

 

Paired with a CPU with at least 8 GB or preferably more CPU RAM will allow application developers to focus on key aspects of their application instead of debugging RAM performance issues.

 

Hard drive

Typical deep learning applications required large sets of data that is in 100s of GB. Since this data cannot be set in any RAM, there is an ongoing data pipeline is constructed. A deep learning application loads the mini-batch data from GPU RAM, which in turns keeps on reading data from CPU RAM, which loads data directly from the hard drive.

 

Since GPU's have a larger number of cores and each of these cores have a mini-batch of their data, they constantly need to be reading large volumes of data from the disk to allow for high data parallelism.

 

For example, in Alexie's Convolutional Neural Network (CNN) based model, roughly 300 MB of data needs to be read every second. This can often cripple the overall application performance. Hence, a solid state driver (SSD) is often the right choice for most deep learning application developers.

 

Deep learning software frameworks

  1. Every good deep learning application needs to have several components to be able to function correctly. These include:
  2. A model layer which allows a developer to design his or her own model with more flexibility
  3. A GPU layer that makes it seamless for application developers to choose between GPU/CPU for its application
  4. A parallelization layer that can allow the developer to scale his or her application to run on multiple devices or instances

 

As you can imagine, implementing these modules is not easy. Often a developer needs to spend more time on debugging implementation issues rather than the legitimate model issues.

 

Thankfully, a number of software frameworks exist in the industry today which make deep learning application development practically the first class of its programming language.

 

These frameworks vary in architecture, design, and feature but almost all of them provide immense value to developers by providing them easy and fast implementation framework for their applications. In this section, we will take a look at some popular deep learning software frameworks and how they compare with each other.

 

TensorFlow a deep learning library

TensorFlow is an open source software library for numerical computation using data flow graphs. Designed and developed by Google, TensorFlow represents the complete data computation as a flow graph.

 

Each node in this graph can be represented as a mathematical operator. An edge connecting two nodes represents the multi-dimensional data that flows between the two nodes.

 

One of the primary advantages of TensorFlow is that it supports CPU and GPU as well as mobile devices, thereby making it almost seamless for developers to write code against any device architecture. TensorFlow also has a very big community of developers leading to a huge momentum behind this framework.

 

Caffe

Caffe was designed and developed at Berkeley Artificial Intelligence Research (BAIR) Lab. It was designed with expression, speed, and modularity in mind.

 

It has an expressive architecture as it allows for a very configurable way to define models and optimization parameters without necessitating any additional code. This configuration also allows an easy switch from CPU to GPU mode and vice-versa with but a single flag change.

 

Caffe also boasts good performance benchmark numbers when it comes to speed. For instance, on a single NVIDIA K40 GPU, Caffe can process over 60 million images per day. Caffe also has a strong community, ranging from academic researchers as well as industrial research labs using Caffe across a heterogeneous application stack.

 

MXNet

MXNet is a multi-language machine learning library. It offers two modes of computation:

Imperative mode: This mode exposes an interface much like regular NumPy like API. For example, to construct a tensor of zeros on both CPU and GPU using

 

MXNet.

Symbolic mode: This mode exposes a computation graph like TensorFlow. Though the imperative API is quite useful, one of its drawbacks is its rigidity.

 

All computations need to be known beforehand along with pre-defined data structures. Symbolic API aims to remove this limitation by allowing MXNet to work with symbols or variables instead of fixed data types.

 

Torch

The torch is a Lua based deep learning framework developed by Ronan Collobert, Clement Farabet, and Koray Kavukcuoglu. It was initially used by the CILVR Lab at New York University.

 

The Torch is powered by C/C++ libraries under its hood and also uses Compute Unified Device Architecture (CUDA) for its GPU interactions. It aims to be the fastest deep learning framework while also providing a simple C-like interface for rapid application development.

 

Theano

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

 

Some of the key features of Theano is its very tight integration with NumPy, making it almost native to a large number of Python developers. It also provides a very intuitive interface to using the GPU or CPU. It has an efficient symbolic differentiation, allowing it to provide derivatives for functions with one or many inputs.

 

It is also numerically stable and has a dynamic code generation capability leading to faster expression evaluations. Theano is a good framework choice if you have advanced machine learning expertise and are looking for a low-level API for fine-grained control of your deep learning application.

 

Both Windows and Linux operating systems

Efficient recurrent network training through batching techniques Data parallelization using one-bit quantized singular value decomposition (SVD)

Efficient modularization that separates:

  • Compute network
  • Execution engine
  • Learning algorithms
  • Model configuration

 

Keras

Keras is a deep learning framework that is probably the most different from every other framework described previously. Most of the described frameworks are low-level modules that directly interact with the GPU using CUDA.

 

Keras, on the other hand, could be understood as a meta-framework that interacts with other frameworks such as Theano or TensorFlow to handle its GPU interactions or other system-level access management.

 

As such, it is highly flexible and very user-friendly, allowing developers to choose from a variety of underlying model implementations.

 

Keras community support is also gaining good momentum and, as of September 2017, TensorFlow team plans to integrate Keras as a subset of the TensorFlow project.

 

Their experimental results demonstrate that all the frameworks can utilize GPUs very efficiently and show performance gains over CPUs. However, there is still no clear winner among all of them, which suggests there are still improvements to be made across all of these frameworks.

 

AI, Deep Learning, and Machine Learning

Artificial intelligence looks at how to create machines that are capable of fulfilling tasks that would normally require human intelligence. This loose definition basically tells you that AI encompasses several fields of research, from expert systems to genetic algorithms, and helps to provide a scope of arguments over what it means to be AI.

 

Machine learning has recently found a lot of success in the field of AI research. It has allowed computers to pass up or come very close to matching up human performances in all areas that range from face recognition to language and speech recognition.

 

Machine learning uses the process of teaching a computer system on how to perform a certain task instead of programming it to perform certain tasks in a step-by-step manner.

 

Once training has been finished, the system is able to come up with accurate predictions when it receives certain data.

 

This all may sound dry, but the predictions could end up answering if a fruit in a picture is an apple or banana, if a person is walking in front of a self-driving vehicle, if the word written in a sentence means a hotel reservation or a paperback, if an email message is a spam, or recognizing speech well enough to create captions on videos on YouTube.

 

Normally, machine learning is broken into supervised learning, which is where the computer is taught things by example from data that has been labeled, and unsupervised learning, which is when the computer groups together similar information and find the anomalies.

 

Deep learning is a single area within the machine learning process whose capabilities are different from traditional shallow machine learning in many important areas. This allows computers to be able to figure a whole host of complex issues that wasn’t able to be solved any other way.

 

A good example of a shallow machine learning task would be predicting that ice cream sales will be different depending on what the temperature is like outside. They make predictions with the use of only a couple of data features, and it is relatively straightforward. This can be carried out in a shallow technique, which is known as a linear regression with gradient descent.

 

The problem comes in the fact that there are a large number of problems within the world that don’t fit very well in such a simple model. One example of a complex real-world issue is being able to recognize handwritten numbers.

 

In order for this problem to be solved, the computer will have to cope with large variations in a way where data can be presented. Each digit that ranges from 0 to 9 can be written in a myriad of different ways.

 

Even the size and shape of the handwritten digits are able to be written in several different ways depending on the person writing them and in certain circumstances.

 

Coping with all of the variables of these different features and the large mess of interactions between each of them is where deep neural networks and deep learning start to be useful. Neural networks, which we will cover more completely in a later blog, are mathematical models whose structure is very loosely based on the brain.

 

Every neuron in the network is a function that will receive data through an input, it then transforms the data into a form that is more amenable, and it will then send it out through an output. These neurons can be viewed as layers.

 

Each of these networks has an input layer. This is where the starting data is fed in. They also have an output layer, which is what generates the last prediction.

 

When it comes to a deep neural network, there are several hidden layers of neurons that are located between these output and input layers, and each one of them feeds data into the other.

 

This is why you have the word deep in deep learning, as well as in deep neural networks. This references the number of hidden layers, which is normally more than three, located in the heart of these neural networks.

 

Neurons are believed to be activated once the sum of the values that are being inputted into the neuron has passed a certain threshold. The activation means is different depending on the layer it is in.

 

In the first hidden layer, activation could mean that the image of the handwritten number may contain a certain combo of pixels that look like horizontal lines at the top part of the number seven. Like this, that first hidden layer would detect a lot of the important curves and lines that would eventually mix together to create the final number.

 

A real neural network would probably have several hidden layers and several neurons in every layer. All of the small curves and lines found on the first layer would be fed in the second hidden layer, and then detect how they are combined to create a recognizable shape that creates a certain digit, like the entire loop of the number six.

Through this act of feeding data between the different layers, each layer will handle a higher-level of features.

 

How are these layers able to tell a computer the nature of a written number? All of the neuron layers provide a way for the network to create a rough hierarchy of different features that create the written number that is in question.

 

For example, if the input shows an array of values that represent the separate pixels in the photo of a written number, the following layer could show a combination of these pixels into shapes and lines, the following layer would combine all of the shapes into specific images such as the loops in an eight or a triangle in four, and so on.

 

When you slowly build up a picture of all of these features, a modern neural network is able to determine, with very good accuracy, the amount that is connected to the written number.

 

In a similar manner, different types of these deep neural networks are able to be trained to pick up of faces in a picture or change audio into written words.

 

The process of creating these increasingly complex hierarchies of features of written digits out of nothing except pixels is taught through the network. The computer is able to learn because of how the network can alter the importance of the connections between each layer’s neurons.

 

Each of the links has an attached value that is known as the weight which will end up modifying the value that is sent out by a neuron as it travels between each layer. By changing up the value of the different weight, and the value that is known as the bias, there is a possibility to emphasize or diminish how important the links are between the network and neurons.

 

For example, when it comes to recognizing a number that was handwritten, these different weights can be changed to show the importance of a certain pixel group that creates a line, or a pair of lines that intersect that create a number seven.

 

 Neural Networks

Neural networks, which are sometimes referred to as Artificial Neural Networks, are a simulation of machine learning and human brain functionality problems. You should understand that neural networks don’t provide a solution for all of the problems that come up but instead provide the best results with several other techniques for various machine learning tasks.

 

The most common neural networks are classification and clustering, which could also be used for regression, but you can use better methods for that.

 

A neuron is a building unit for a neural network, which works like a human neuron. A typical neural network will use a sigmoid function. This is typically used because of the nature of being able to write out the derivative using f(x), which works great for minimizing error.

 

Let’s look at what a Perceptron is. Dendrites are extensions that come off the nerve cell. These are what get the signals, and they then send them onto the cell body, which processes the stimulus and then will make a decision to either trigger a signal or not. When a cell chooses to trigger a signal, the cell body extension known as an axon will trigger a chemical transmission at its end to a different cell.

 

There is no need to feel like you have to memorize any of this. We aren’t actually studying neuroscience, so you only need a vague impression of how this works.

 

Perceptron looks similar to an actual neuron because they were inspired by the way actual neurons work. Keep in mind; it was only inspired by a neuron and in no way acts exactly like a real one. The way a Perceptron processes data is as such:

 

1. There are small circles on the left side of the Perceptron which are the “neurons” and they have x subscripts 1, 2,…, m that carries data input.

2. All of the inputs are multiplied by a weight, which is labeled using a subscript 1, 2, … , m, along with a long arrow called the synapse and travels to the big circle in the middle. So you will have w1 * x1, w2 * x2, w3 * x3, and so on.

 

3. After all of the inputs have been multiplied by the weight, you will sum them all up and add a bias that had been pre-determined.

 

4. The results are then pushed onto the right. You will then use the step function. All of these tells that if the number you get from step three is greater than or equal to zero, you will receive a one as your output, otherwise, if your result is lower than zero, the output will be zero.

 

5. You will get an output of either zero or one.

 

If you were to switch the bias and place it on the right in the activation function such as “sum(wx) ≥ -b” the –b would be known as a threshold value. With this, if the sum is higher than or equal to your threshold, then your activation trigger is one. Otherwise, it would come out to be zero. Pick the one that helps you understand this process because both of these representations are interchangeable.

 

Now, you have a pretty good understanding of how a Perceptron works. All it’s made up of is some mechanical multiplications, which then make summations, and then ultimately give you activation, and that will give you an output.

 

Just to make sure that you fully understand this, let’s have a look at a really simple example that is not really realistic. Let’s assume that you have found extreme motivation after you have read this book and you have to decide if you are going to study deep learning or not. You have three major factors that will help you make your decision:

 

  1. Will you be able to make more money once you master deep learning: 0 – No, 1 – Yes.
  2. Is the needed programming and mathematics simple: 0 – No, 1 – Yes.
  3. You are able to use deep learning immediately and not have to get an expensive GPU: 0 – No, 1 – Yes.

 

Our input variables will be x1, x2, and x3 for all of the factors, and we’ll give them each a binary value since they are all simple yes or no questions.

Let’s assume that you really love deep learning and you are now ready to work through your lifelong fear of programming and math. You also have some money put away to invest in the expensive Nvidia GPU that will train the deep learning model.

 

You can assume that both of these have the same importance because both of them can be compromised. But, you really want to be able to make extra money once you have spent all of the energy and time into learning about deep learning. Since you have a higher expectation of ROI, if you can’t make more moolah, you aren’t going to waste your time learning deep learning.

 

Now that you have a decent understanding of the decision preferences, we can assume that you have a 100 percent probability of making extra cash once you have learned deep learning because there’s plenty of demand for less supply. That means x1 = 1. Let’s assume that programming and math are extremely hard. That means x2 = 0.

 

Finally, let’s assume that you are going to need a powerful GPU such as a Titan X. That means x3 = 0. Okay, now that you have the inputs, you can initialize your weights. We’re going to try w1 = 8, w2 = 3, w3 = 3.

 

The higher the value for the weight, the bigger the influence it has with the input. Since the money you will make is more important, your decision for learning deep learning is, w1 is greater than w2, and w1 is greater than w3.

 

Let’s say that the value of the threshold is five, which equals the bias of negative five. We add everything together and add in the bias term. Since the threshold value is five, you will decide to learn deep learning if you are going to make more money.

 

Even if the math turns out to be easy and you aren’t going to have to buy a GPU, you won’t study deep learning unless you are able to make extra money later on.

 

Now, you have a decent understanding of bias and threshold. With a threshold as high as five, that means the main factor has to be satisfied in order for you to receive an output of one. Otherwise, you will receive a zero.

 

The fun part comes next: varying the threshold, bias, and weights will provide you with different possible decision-making models. With our example, if you lower your threshold from five to three, then you will get different scenarios where the output would be one.

 

Despite how well loved these Perceptrons were, the popularity faded quietly due to its limitations. Later on, people realized that multi-layer Perceptrons were able to learn the logic of an XOR gate, but this requires the use of back propagation so that the network can learn from its own problems. Every single deep learning neural networks are data-driven.

 

If you are looking at a model and the output it has is different from the desired output, you will have to have a way to backpropagate the error information throughout the network in order to let the weight know they need to adjust and fix themselves by a certain amounts.

 

This is so that, the real outputs from the model will start getting closer in a gradual way to the desired output with each round of testing.

 

As it turned out when it comes to the more complicated tasks that involved outputs that couldn’t be shown with a linear combination of inputs, meaning the outputs aren’t linearly separable or non-linear, the step function will not work because the backpropagation won’t be supported. This requires that your activation function should have meaningful derivatives.

 

Here’s just a bit of calculus: a step function works as a linear activation function where your derivative comes out to 0 for each of the inputs except for the actual point of 0.

 

At the point of 0, your derivative is going to be undefined because the function becomes discontinuous at this point. Even though this may be an easy and simple activation function, it’s not able to handle the more complicated tasks.

 

Sigmoid function: f(x) = 1/1+e^-x

 

“ import Jimp = require(“jimp”); Import Promise from “ts-promist”; Const synaptic = require(“synaptic”); Const _ = require(“lodash”); Const Neuron = synaptic.Neuron,

Layer = synaptic.Layer, Network = synaptic.Network, Trainer = synaptic.Trainer, Architect = synaptic.Architect; Function getImgData(filename) {

Return new Promise((resolve, reject) => { Jimp.read(filename).then((image) => {

Let inputSet: any = [];

Image.scan(0, 0, image.bitmap.width, image.bitmap.height, function (x,

y, idx) {

Var red = image.bitmap.data[idx + 0]; Var green = image.bitmap.data[idx + 1];

inputSet.push([re, green]);

});

Resolve(inputSet);

}).catch(function (err) {

Resolve([]);

});

});

}

Const myPerceptron = new Archietect.Perceptron(4, 5); Const trainer = new Trainer(myPerceptron); Const traininSet: any = [];

getImgData(‘ imagefilename.jpg’). then((inputs: any) => {

getImageData(‘imagefilename.jpg’).then((outputs: any) => {

for (let i=0; I < inputs.length; i++) {

trainingSet.push({

input: _.map(inputs[i], (val: any) => val/255),

output: _.map(outputs[i], (val: any) => val/255)

});

}

Trainer.train(trainingSet, {

Rate:.1,

Interations: 200,

Error: .005,

Shuffle: true,

Log: 10,

Cost: Trainer.cost.CROSS_ENTROPY

});

Jimp.read(‘yours.jpg’).then((image) => {

Image.scan(0, 0, image.bitmap.width, image.bitmap.height, (x, y, idx)

=> {

Var red = image.bitmap.data[idx + 0]; Var green = image.bitmap.data [idx + 1];

Var out – myPerceptron.activate([red/255, green/255);

Image.bitmap.data[idx + 0] = _.round(out[0] * 255);

Image.bitmap.data[idx + 1] = _.round(out[1] * 255);

});

Console.log(‘out.jpg’);

Image.write(‘out.jpg’);

}).catch(function (err) {

Console.error(err);

});

});

}); ”

 

K-Nearest Neighbor Algorithm

This is one of the easiest algorithms to learn and use, so much so that Wikipedia refers to it as the “lazy algorithm.” The concept of the algorithm is fewer statistics based and more reasonable deduction. Basically, it tries to identify the groups that are closest to each other. When k-NN is used on a two-dimensional model, it will rely on Euclidian distance.

 

This only happens if you are working with a one norm distance as it relates to square streets, and those cars can travel in a single direction at a time. The point I’m making is that the models and objects in this rely on two dimensions, just like the classic XY graph.

 

k-NN tries to identify groups that are situated around a certain number of points. K is the specified number of points. There are certain ways to figure out how big your k needs to be because it is an inputted variable that the data science system or user has to pick.

 

This model is perfect for feature clustering, basic market segmentation, and finding groups that are among specific data points. The majority of programming languages will let you implement in a couple of code lines.

 

Automatically Placing Audio in Silent Movies

When it comes to this, the system synthesizes the sounds that are similar to silent movies. This system was trained with a thousand examples from different videos with sounds of a drumstick hitting different types of surfaces and coming up with different types of sounds.

 

Deep learning models associate the frames of the video with a pre-recorded sound database so that it can choose a sound to play and matches up the best with the things going on in the scene.

 

They use a Turing Test to evaluate the system such as a setup where humans will have to figure out if the video has real or fake sounds. This uses applications of LSTM as well as RNN.

 

Automatic Machine Translation

This process is where a given word, sentence, or phrase is said in one language and then automatically translated to another language. This technology has been around for a while, but deep learning has gotten the best results in two areas:

  • Image translations
  • Text translations

 

These text translations can be done without the need for pre-processing the sequence, which allows the algorithm to be able to learn the dependencies between the word and the new language mapping.

 

Automatic Text Generation

This task is one of the most interesting. This is where a body of text has been learned, and new text is created either character-by-character or word-by-word.

 

This model can learn how to capture text styles, forms of sentences, punctuations, and spelling in the body. Large recurrent neural networks are helpful when it comes to learning the relationship between different items in an input string sequence, and it will then generate text.

 

Automatic Handwriting Generation

This task has provided a corpus of examples of handwriting and generates new handwriting for a certain phrase or word. The handwriting is given as coordinate sequences used by a pen once the samples have been created. From the body, the connection of the letters and the pen movement is learned and the new examples are able to be created ad hoc.

 

Internet Search

Chances are when you hear the word search; your first thought is Google. But there are actually several other search engines out there such as duckduckgo, AOL, Ask, Bing, and Yahoo.

 

Every search engine out there uses some form of a data science algorithm to provide their users with the best results for their search query in less than a second. Think about this. Google process over 20 petabytes of data every single day. If there wasn’t any data science, Google would not be as good as it is today.

 

GPU programming model

At this point, it is necessary to introduce some basic concepts to understand the CUDA programming model. The first distinction is between host and device.

 

The code executed in the host side is the part of code executed on the CPU, and this will also include the RAM and the hard disk.

However, the code executed on the device is automatically loaded on the graphics card and run on the latter. Another important concept is the kernel; it stands for a function performed on the device and launched from the host.

 

The code defined in the kernel will be performed in parallel by an array of threads. The following figure summarizes how the GPU programming model works:

 

The running program will have the source code to run on CPU and code to run on GPU

  • CPU and GPU have separated memories
  • The data is transferred from CPU to GPU to be computed
  • The data output from GPU computation is copied back to CPU memory
  • GPU programming model

 

TensorFlow GPU set up

The NVIDIA deep learning SDK offers powerful tools and libraries for the development of deep learning frameworks such as Caffe, CNTK, TensorFlow, Theano, and Torch. The NVIDIA deep learning SDK provides powerful tools and libraries for designing and deploying GPU-accelerated deep learning applications.

 

It includes libraries for deep learning primitives, inference, video analytics, linear algebra, sparse matrices, and multi-GPU communications. The current implementation supports the following SDKs:

 

Deep learning primitives: (https://developer.nvidia.com/cudnn) High-performance building blocks for deep neural network applications including convolutions, activation functions, and tensor transformations.

 

Deep learning inference engine: (https://developer.nvidia.com/tensorrt) High- performance deep learning inference runtime for production deployment.

 

Deep learning for video analytics: (https://developer.nvidia.com/deepstream-sdk) High-level C++ API and runtime for GPU-accelerated transcoding and deep learning inference.

 

Linear algebra: (https://developer.nvidia.com/cublas) GPU-accelerated BLAS functionality that delivers 6x to 17x faster performance than CPU-only

 

BLAS libraries. The XLA (https://www.tensorflow.org/performance/xla/) is a domain-specific compiler for linear algebra that optimizes TensorFlow computations. Although this experimental (that is, under active development), however, the possible results are improvements in speed, memory usage, and portability on the server and mobile platforms.

 

Sparse matrix operations: (https://developer.nvidia.com/cusparse) GPU-accelerated linear algebra subroutines for sparse matrices that deliver up to 8x faster performance than CPU BLAS (MKL), ideal for applications such as natural language processing.

 

Multi-GPU communication: (https://github.com/NVIDIA/nccl) Collective communication routines, such as all-gather, reduce, and broadcast that accelerates multi-GPU deep learning training on up to eight GPUs.

 

However, the deep learning SDK requires CUDA toolkit (https://develo per.nvidia.com/cuda-toolkit), which offers a comprehensive development environment for building new GPU-accelerated deep learning algorithms, and dramatically increase the performance of existing applications.

 

During the writing of this book, the version of cuDNN was 5.1 and released on Jan 20, 2017, for CUDA 8.0. For more details at https://developer. nvidia.com/rdp/cudnn-download.

 

Building a DL Network Using MXNet

This program is independent of any library and can be used by other programming languages with the corresponding API.

 

Finally, there exist several tutorials for MXNet, should you wish to learn more about its various functions. Because MXNet is an open-source project, you can even create your own tutorial, if you are so inclined.

 

What’s more, it is a cross-platform tool, running on all major operating systems. MXNet has been around long enough that it is a topic of much research.

 

Core components Gluon interface

Gluon is a simple interface for all your DL work using MXNet. You install it on your machine just like any Python library:

pip install MXNet —pre —user

 

The main selling point of Gluon is that it is straightforward. It offers an abstraction of the whole network building process, which can be intimidating for people new to the craft.

 

Also, Gluon is very fast, not adding any significant overhead to the training of your DL system. Moreover, Gluon can handle dynamic graphs, offering some malleability in the structure of the ANNs created. Finally, Gluon has an overall flexible structure, making the development process for any ANN less rigid.

 

Naturally, for Gluon to work, you must have MXNet installed on your machine (although you don’t need to if you are using the Docker container provided with this blog). This is achieved using the familiar pip command:

pip install MXNet —pre —user

 

Because of its utility and excellent integration with MXNet, we’ll be using Gluon throughout this blog, as we explore this DL framework. However, to get a better understanding of MXNet, we’ll first briefly consider how you can use some of its other functions.

 

NDArrays

The NDArray is a particularly useful data structure that’s used throughout an MXNet project. NDArrays are essentially NumPy arrays, but with the added capability of asynchronous CPU processing.

 

They are also compatible with distributed cloud architectures, and can even utilize automatic differentiation, which is particularly useful when training a deep learning system, but NDArrays can be effectively used in other ML applications too. NDArrays are part of the MXNet package, which we will examine shortly. You can import the NDArrays module as follows:

from MXNet import nd

To create a new NDArray consisting of 4 rows and 5 columns, for example, you can type the following:

nd.empty((4, 5))

 

The output will differ every time you run it since the framework will allocate whatever value it finds in the parts of the memory that it allocates to that array. If you want the NDArray to have just zeros instead, type:

nd.zeros((4, 5))

To find the number of rows and columns of a variable having an NDArray assigned to it, you need to use the .shape function, just like in NumPy:

x = nd.empty((2, 7))

x.shape

Finally, if you want to find to total number of elements in an NDArray, you use the .size function:

x.size

 

The operations in an NDArray are just like the ones in NumPy, so we won’t elaborate on them here. Contents are also accessed in the same way, through indexing and slicing.

 

Should you want to turn an NDArray into a more familiar data structure from the NumPy package, you can use the asnumpy() function:

y = x.asnumpy()

The reverse can be achieved using the array() function:

z = nd.array(y)

 

One of the distinguishing characteristics of NDArrays is that they can assign different computational contexts to different arrays—either on the CPU or on a GPU attached to your machine (this is referred to as “context” when discussing NDArrays).

 

This is made possible by the CTX parameter in all the package’s relevant functions. For example, when creating an empty array of zeros that you want to assign to the first GPU, simply type:

a = nd.zeros(shape=(5,5), ctx=mx.gpu(0))

 

Of course, the data assigned to a particular processing unit is not set in stone. It is easy to copy data to a different location, linked to a different processing unit, using the copy to() function:

y = x.copyto(mx.gpu(1)) # copy the data of NDArray x to the 2nd GPU

You can find the context of a variable through the .context attribute:

print(x.context)

 

It is often more convenient to define the context of both the data and the models, using a separate variable for each. For example, say that your DL project uses data that you want to be processed by the CPU, and a model that you prefer to be handled by the first GPU. In this case, you’d type something like:

DataCtx = mx.cpu()

ModelCtx = mx.gpu(0)

MXNet package in Python

 

The MXNet package (or “MXNet,” with all lower-case letters, when typed in Python), is a very robust and self-sufficient library in Python. MXNet provides deep learning capabilities through the MXNet framework. Importing this package in Python is fairly straightforward:

import MXNet as mx

 

If you want to perform some additional processes that make the MXNet experience even better, it is highly recommended that you first install the following packages on your computer:

graphviz (ver. 0.8.1 or later)

requests (ver. 2.18.4 or later)

numpy (ver. 1.13.3 or later)

You can learn more about the MXNet package through the corresponding GitHub repository.

Recommend