Deep Learning (DL) is a subset of Machine Learning (ML) that involves learning a hierarchy of features to gain meaningful insight from a complex input space. DL networks can be visualized as a neural network with two or more layers of neurons performing calculations. This set of algorithms includes Deep Belief Networks (DBN,) Convolutional Neural Networks (CNN,) and Recurrent Neural Networks (RNN.)
Deep Learning vs. “Shallow” Learning
In traditional ML systems, a human (usually a subject matter expert) selects features that are determined to be useful in classification and given as inputs to an ML algorithm. Then, the algorithm learns how to use these features to maximize classification accuracy. In a DL system, features are learned through a mathematical process like back propagation at every layer of the network. On the first layer of the network, the DL system would learn rudimentary features that can be calculated from the raw input signals. On the second layer, the network learns more complex features using combinations of the features learned on the first layer, and so on. By the end of the network, high-level features are learned on top of all the previous feature layers such that classification on these high-level features is an easy problem, where the classification on the raw input signals may have been very difficult. You may be curious, “what can a deep neural network calculate that a single layer neural network cannot?” Surprisingly, the answer is...nothing! That’s right, a single layer perceptron is a universal function approximator, meaning that with just one layer of neurons, we are able to model any problems solution with just one layer. So why would we ever add more layers?
One problem with a single layer approach is that it can take an exponential number of neurons in a single layer to produce the same result as a deep network. If we had 3 layers of 500 neurons (a total of 1,500 neurons,) it could take hundreds of billions of neurons to achieve the same function approximation in a single layer! Beyond that, the problem is intractable, meaning that we cannot solve for the solution directly. Instead, we must approximate what this single layer is. Since we don’t know how to solve this problem compactly in a shallow layer, we instead exploit deep learning to solve our problems.

While people are often excited to use deep learning, they can quickly get discouraged by the difficulty of implementing their own deep networks. Some of the main reasons people struggle with Deep Learning include:
1. Deep Learning requires large datasets. Because deep learning must learn to generalize these large feature hierarchies, they require a very large amount of data to operate on.
2. Deep Learning requires lots of computational power. A large dataset means that many computations will need to be done to iterate the training data enough times for learning to converge. Modern deep networks may have hundreds of layers, making every training example infeasible on anything except a high-end graphical processing unit (GPU.)
3. Deep Learning is hard to train and optimize. Because we don’t actually know what the features the networks learn are, it can be difficult to know what’s going wrong when our DL systems aren’t producing good results.