There are many deep learning resources freely available online, but it can be confusing knowing where to begin. Go from vague understanding of deep neural networks to knowledgeable practitioner in 7 steps!
Deep learning is a branch of machine learning, employing numerous similar, yet distinct, deep neural network architectures to solve various problems in natural language processing, computer vision, and bioinformatics, among other fields. Deep learning has experienced a tremendous recent research resurgence, and has been shown to deliver state of the art results in numerous applications.
In essence, deep learning is the implementation of neural networks with more than a single hidden layer of neurons. This is, however, a very simplistic view of deep learning, and not one that is unanimously agreed upon. These “deep” architectures also vary quite considerably, with different implementations being optimized for different tasks or goals. The vast research being produced at such a constant rate is revealing new and innovative deep learning models at an ever-increasing pace.
Currently a white hot research topic, deep learning seems to be impacting all areas of machine learning and, by extension, data science. A look overrecent papers in the relevant arXiv categories makes it easy to see that a large amount of what is being published is deep learning-related. Given the impressive results being produced, many researchers, practitioners, and laypeople alike are wondering if deep learning is the edge of “true” artificial intelligence.
This collection of reading materials and tutorials aims to provide a path for a deep neural networks newcomer to gain some understanding of this vast and complex topic. Though I do not assume any real understanding of neural networks or deep learning, I will assume your familiarity with general machine learning theory and practice to some degree. To overcome any deficiency you may have in the general areas of machine learning theory or practice you can consult the recent KDnuggets post 7 Steps to Mastering Machine Learning With Python. Since we will also see examples implemented in Python, some familiarity with the language will be useful. Introductory and review resources are also available in the previously mentioned post.
This post will utilize freely-available materials from around the web in a cohesive order to first gain some understanding of deep neural networks at a theoretical level, and then move on to some practical implementations. As such, credit for the materials referenced lie solely with the creators, who will be noted alongside the resources. If you see that someone has not been properly credited for their work, please alert me to the oversight so that I may swiftly rectify it.
A stark and honest disclaimer: deep learning is a complex and quickly-evolving field of both breadth and depth (pun unintended?), and as such this post does not claim to be an all-inclusive manual to becoming a deep learning expert; such a transformation would take greater time, many additional resources, and lots of practice building and testing models. I do, however, believe that utilizing the resources herein could help get you started on just such a path.
Step 1: Introducing Deep Learning
If you are reading this and interested in the topic, then you are probably already familiar with what deep neural networks are, if even at a basic level. Neural networks have a storied history, but we won’t be getting into that. We do, however, want a common high level of understanding to begin with.
First, have a look at the fantastic introductory videos from DeepLearning.tv. At the time of this writing there are 14 videos; watch them all if you like, but definitely watch the first 5, covering the basics of neural nets and some of the more common architectures.
Next, read over the NIPS 2015 Deep Learning Tutorial by Geoff Hinton, Yoshua Bengio, and Yann LeCun for an introduction at a slightly lower level.
To round out our first step, read the first chapter of Neural Networks and Deep Learning, the fantastic, evolving online book by Michael Nielsen, which goes a step further but still keeps things fairly light.
Step 2: Getting Technical
Deep neural nets rely on a mathematical foundation of algebra and calculus. While this post will not produce any theoretical mathematicians, gaining some understanding of the basics before moving on would be helpful.
First, watch Andrew Ng’s linear algebra review videos. While not absolutely necessary, for those finding they want something deeper on this subject, consult the Linear Algebra Review and Reference from Ng’s Stanford course, written by Zico Kolter and Chuong Do.
Then look at this Introduction to the Derivative of a Function video byProfessor Leonard. The video is succinct, the examples are clear, and it provides some understanding of what is actually going on during backpropagation from a mathematical standpoint. More on that soon.
Next have a quick read over the Wikipedia entry for the Sigmoid function, a bounded differentiable function often employed by individual neurons in a neural network.
Step 3: Backpropagation and Gradient Descent
An important part of neural networks, including modern deep architectures, is the backward propagation of errors through a network in order to update the weights used by neurons closer to the input. This is, quite bluntly, from where neural networks derive their “power,” for lack of better term. Backpropagation for short (or even “backprop”), is paired with an optimization method which acts to minimize the weights that are subsequently distributed (via backpropagation), in order to minimize the loss function. A common optimization method in deep neural networks is gradient descent.
Next, have a look at this step by step example of backpropagation in actionwritten by Matt Mazur.
Step 4: Getting Practical
The specific neural network architectures that will be introduced in the following steps will include practical implementations using some of the most popular Python deep learning libraries present in research today. Since different libraries are, in some cases, optimized for particular neural network architectures, and have established footholds in certain fields of research, we will be making use of 3 separate deep learning libraries. This is not redundant; keeping up with the latest libraries for particular areas of practice is a critical part of learning. The following exercises will also allow you to evaluate different libraries for yourself, and form an intuition as to which to use for which problems.
At this point you are welcome to choose any library or combination of libraries to install, and move forward implementing those tutorials which pertain to your choice. If you are looking to try one library and use it to implement one of each of the following steps’ tutorials, I would recommend TensorFlow, for a few reasons. I will mention the most relevant (at least, in my view): it performs auto-differentiation, meaning that you (or, rather, the tutorial) does not have to worry about implementing backpropagation from scratch, likely making code easier to follow (especially for a newcomer).
I wrote about TensorFlow when it first came out in the post TensorFlow Disappoints – Google Deep Learning Falls Shallow, the title of which suggests that I had more disappointment with it than I actually did; I was primarily focused on its lack of GPU cluster-enabled network training (which is likely soon on its way). Anyhow, if you are interested in reading more about TensorFlow without consulting the whitepaper listed below, I would suggest reading my original article, and then following up with Zachary Lipton’s well-written piece, TensorFlow is Terrific – A Sober Take on Deep Learning Acceleration.
Google’s TensorFlow is an all-purpose machine learning library based on data flow graph representation.
- Install TensorFlow by visiting here
- Have a look at its whitepaper
- Try out its introductory tutorial
- Keep its documentation handy
Theano is actively developed by the LISA group at the University of Montreal.
Caffe is developed by the Berkeley Vision and Learning Center (BVLC) at UC Berkeley. While Theano and TensorFlow can be considered “general-purpose” deep learning libraries, Caffe, being developed by a computer vision group, is mostly thought of for just such tasks; however, it is also a fully general-purpose library for use building various deep learning architectures for different domains.
- Go here to install Caffe
- Read its introductory tutorial presentation to familiarize yourself
- Have a look at its documentation as well
Keep in mind that these are not the only popular libraries in use today. In fact, there are many, many others to choose from, and these were selected based on the prevelance of tutorials, documentation, and acceptance among research in general.
Other deep learning library options include:
- Keras – a high-level, minimalist Python neural network library for Theano and TensorFlow
- Lasagne – lightweight Python library for atop Theano
- Torch – Lua machine learning algorithm library
- Deeplearning4j – open source, distributed deep learning library for Java and Scala
- Chainer – a flexible, intuitive Python neural network library
- Mocha – a deep learning framework for Julia
With libraries installed, we now move on to practical implementation.
Step 5: Convolutional Neural Nets and Computer Vision
Computer vision deals with the processing and understanding of images and its symbolic information. Most of the field’s recent breakthroughs have come from the use of deep neural networks. In particular, convolutional neural networks have played a very important role in computer vision of late.
First, read this deep learning with computer vision tutorial by Yoshua Bengio, in order to gain an understanding of the topic.
Here is a Theano tutorial which is roughly equivalent to the above Caffe exercise.
Afterward, read a seminal convolutional neural network paper by Krizhevsky, Sutskever, and Hinton for additional insight.
Step 6: Recurrent Nets and Language Processing
Natural language processing (NLP) is another domain which has seen benefits from deep learning. Concerned with understanding natural (human) languages, NLP has had a number of its most recent successes come by way of recurrent neural networks (RNN).
Andrej Karpathy has a fantastic blog post titled “The Unreasonable Effectiveness of Recurrent Neural Networks” which outlines the effectiveness of RNNs in training character-level language models. The code it references is written in Lua using Torch, so you can skip over that; the post is still useful on a purely conceptual level.
This tutorial implements a recurrent neural in TensorFlow for language modeling.
Finally, you can read Yoon Kim’s Convolutional Neural Networks for Sentence Classification for another application of CNNs in language processing. Denny Britz has a blog post titled “Implementing A CNN For Text Classification in TensorFlow,” which does just as it suggests using movie review data.
Step 7: Further Topics
The previous steps have progressed from theoretical to practical topics in deep learning. By installing and implementing convolutional neural nets and recurrent neural nets in the previous 2 steps, it is hoped that one has gained a preliminary appreciation for their power and functionality. As prevalent as CNNs and RNNs are, there are numerous other deep architectures in existence, with additional emerging from research on a regular basis.
There are also numerous other considerations for deep learning beyond those presented in the earlier theoretical steps, and as such, the following is a quick survey of some of these additional architectures and concerns.
This clearly does not cover all deep learning architectures. Restrictive Boltzmann Machines are an obvious exclusion which comes to mind, as areautoencoders, and a whole series of related generative models includingGenerative Adversarial Networks. However, a line had to be drawn somewhere, or this post would continue ad infinitum.
For those interested in learning more about various deep learning architectures, I suggest this lengthy survey paper by Yoshua Bengio.
For our final number, and for something a bit different, have a look at A Statistical Analysis of Deep Learning by Shakir Mohamed of Google DeepMind. It is more theoretical and (surprise, statistical) than much of the other material we have encountered, but worth looking at for a different approach to familiar matter. Shakir wrote the series of articles over the course of 6 months, and is presented as testing wide-held beliefs, highlighting statistical connections, and the unseen implications of deep learning. There is a combined PDF of all posts as well.
It is hoped that enough information has been presented to give the reader an introductory overview of deep neural networks, as well as provide some incentive to move forward and learn more on the topic.