Understanding Deep Learning
Artificial Intelligence (AI) and Machine Learning (ML) are some of the hottest topics right now. But most of the people don’t understand it well. But in this article I will describe the simplest description about deep learning. The basic idea is this article as simple as possible which will give the reader overall idea about the working principle of deep learning. There are some basic terms which we need to understand first.
Artificial Intelligence: AI is the replication of human intelligence in computers.
Machine Learning: ML refers to the ability of a machine to learn using large data sets instead of hard coded rules.
AI mainly uses deep learning which is able to extract features automatically by its own whereas in machine learning the programmer/ researcher have to provide the features manually i.e. we have decide that which feature will give the best result. Other than that machine learning includes the selection of a good classifier. Unlike machine learning AI have high accuracy. If you consider the computation time of deep learning model then it is more computationally intensive i.e. it may take from few minutes to one month or more to train or model but in case of machine learning the results can be obtained with very less period of time. We can use both CPU and GPU for deep learning and Machine learning. GPU is preferred more in case of deep learning.
When the data is labelled we call it as supervised learning.
Data is labelled and we expect from the model to give the correct output. If the output prediction is wrong then it recalculates the error and again iterates to give the right prediction. We can take an example like this. If a bank has the data of its customer which contains the number of customers are leaving from the bank and also the data of loyal customers. The bank wants to know in next month which costumers are likely to leave the bank. The data contains other features such as total bank balance, loan amount, name, address, credit score etc. As we have the data of both leaving and loyal customers we can analyse and predict the output can calculate the error.
When the data is not structured or labelled and we try to predict the data using ML or DL we call it as unsupervised learning. e.g. clustering is a unsupervised algorithm which can be used to determine various natural groups or clusters. The approach is to divide data points in such a way that the similar data points falls in the same group.
Working principles of Deep Learning:
Let’s deep dive into deep learning…
Deep learning is a learning technique which is inspired by human brain. Like human brain it has neurons which mathematically computes the input and gives the desired output. It is able to do the task of both supervised and unsupervised learning. It computes the error known as cost function in order to give the best possible result.
The cost function averages the error of loss function. Whereas loss function computes for single training example.
As Mentioned above our AI model (neural network) consist of neuron and they all are inter connected with each other. There three different layers. First comes Input layer, the middle and most important layer is hidden layer and finally the output layer.
The input layer takes data and gives to the hidden layer from after that the computation starts. Input layer does not involved in computing.
Then comes the hidden layer which has many nodes and all nodes are inter connected with each other. The more go to the deep network is able to extract more complex patterns from the input data. So this a challenge to decide the number of hidden layers and nodes inside the hidden layer to get the best output. It performs all the mathematical computations, adjusts weights and bias and finally connected to the output layer. Each node have their activation function. There are different type of activation functions such as, sigmoid, relu, tanh etc. Uses of the activation function varies with the type of problem statement.
relu: a= max(a,z) (1)
sigmoid: a = 1/(1+e^-z) (2)
tanh: (e^z + e^-z)/ (e^z -e^-z) (3)
There are various neural network architectures which are used for perform different tasks such as CNN (Convolutional Neural Network) works good in image and RNN (Recurrent Neural network) is used for sequence data.
When we initialize weights inside a network it is randomly initialized and we need to update it continuously until the cost function becomes low. When the cost function minimizes model accuracy increases. So we apply a technique which is known as gradient descent which helps to find the minimum of a function. It computes the derivative of cost function at a certain set of weights.
Considering logistic regression following mathematical formula can compute cost function and gradient descent.
Hypothesis function: y_hat=hθ(x)= θ1 + θ(x) (4)
Where, hθ(x) is the hypothesis function
Cost function: j(θ1, θ2) = 1/2m Σi=1 to m(y_hat -y)² (5)
y-hat is the predicted value and y is actual value. The cost function is also called as squared error function.
Gradient descent: θj- α ∂/∂ θj (θ0,θ1) (6)
αis the learning rate, The learning rate decides the steps taken by gradient descent . If is too small then it take longer time to convergence. In the other hand if value is large it will overshoot and may fail to convergence.
There are many pre-trained models which are specifically designed to do some certain tasks. You can use their pre-trained weights and increase the accuracy. This kind of learning method is called as transfer learning. ResNet, GoogleNet, VGGNet are some of the example of pre-trained model.
The output layer is the final layer where we can obtain our desired prediction.
We call it as deep learning because the network have more than one hidden layer.
One of the hardest part of deep learning is the training part where we need a data set. And also a computational power is required.
In summary deep learning is inspired from animal brain. There are 3 layers i.e. input layer, hidden layer and output layer. Hidden layer is responsible for computing the input data. Finally the output layer gives the desire output. When the cost function is minimum we get the best result out of it. In order to automatically calculate the cost function gradient decent is used. I hope you got the overall idea on deep learning and feel free to comment below.
source : deeplearning.ai