Learning Mechanism of Artificial Neural Networks : Backpropagation
Artificial neural networks with feedforward structure consist of three different types of layers. First layer of the networks is called input layer and data enter the network through the input layer. Last layer of the network is called output layer. After data passed the network, they reach output layer and exit the network from here. Hidden layers are the layers between input and output layers. They can be formed in any size and any number.
All those layers mentioned above have neurons in various numbers. Artificial neural networks are formed with the connections that neurons make with each other. Beside input data, there are weight values assigned to each neuron connection. These weight values can be interpreted as the effect of inputs on the output. Initially these weight values are assigned randomly. What we mean by learning of an artificial neural network is that the process of updating these weight values for a certain period of time to make the network give correct outputs for various input data. This process of updating weights is called backpropagation algorithm.
Let’s imagine that we have created an artificial neural network with the structure shown above. We will do a forward pass and we will calculate total loss and finally we will do a backward pass as an example to demonstrate how the backpropagation algorithm works. For this, we will first determine the random input and weight values that ourselves will select and we will also determine the output values that we want to achieve in the same way. Let’s choose our bias values as 1.
The Forward Pass
First, let’s calculate the total input value from the first neuron in the hidden layer.
And let’s say we are using the logistic function as activation function in our neurons. So we are going to put the total input value of neuron h1 into the activation function and we will get an output value.
We can calculate same values for h2 too.
To complete the forward pass we should do the same calculations for output neuron.
Calculating Error
The forward pass is completed as a result of the mathematical calculations in which both the weight and input values exist, as mentioned in the section above. After randomly assigned input and weight values, we reached an output value. This output value we have achieved was 0.37163015135768, however, as we described at the beginning, the output value we really wanted to reach was 0.99. The things we can not change in this process are input and output values. We can visualise this as a supervised learning model. Let’s imagine that we model a network which takes a dog photo as input and trying to get a dog label as output. In such a problem, the dog photo is called our input data and dog label is called our output data. If we do not get a correct output value in the beginning, we do not consider changing the image data or we do not consider changing the target label. Instead, the model needs to be updated to give the correct output as a result of the inputs entered. The only way to do this is to update the weight values.
We use a loss function to measure how much of the output value calculated by predefined weight values are far from the target output value (how much error). The more accurately we update the weight values, the more accurate the output we get. The more accurate the output is, the lower the error value calculated using the loss function. Updating weight values means lowering our model’s error.
So at the end of the forward pass phase we calculate the error that our model has by using a loss function. We define squared error loss function in this example.
Now that we know the loss function we use, we can calculate the error of our model.
The Backward Pass
Now we are going to update our weight values while going through backward. Let’s say we want to change the value of w5, so we first need to find the effect of the w5 over total error which is E_0. We are going to find this calculating the gradient with respect to w5.
We better calculate this equation step by step.
Now we can see that w5 has an effect of -0.059693437674795 over the total loss of the model. Effect of w5 is reducing the total loss of the model with given input data. So we should strengthen the value of w5. What if the gradient with respect to w5 happens to be a positive number? Then we would weaken the value of w5. How are we going to do that? We are going to add the negative value of the gradient value to weight value. But before we do that we should define a learning rate (which takes values between 0 and 1) and actually we should multiply this value with the gradient. In this case we are going to set our learning rate to 0.4.
If we do the same calculations for w6, we get the following result.
After we calculated w6+, all the weight updates in output layer is completed. Now we should continue backward passing through hidden layer and should update w1, w2, w3 and w4. Keep in mind that to update these weight we are going to use the original weight values of w5 and w6, not the updated ones (w5+ and w6+). We are going to update our weights at the end of the back pass phase before making another forward pass with our new weight values.
To be able update these weights, again we need to find the gradients with respect to each of these weights. Let’s start with the gradient with respect to w1.
This equation could be written down as follows.
We better calculate this equation step by step again.
We already calculated the gradient with respect to out_o and the gradient with respect to in_o for the weights of output layer, when we were trying to find the gradients with respect to w5 and w6. We only need to concern about the gradient with respect to w1 expression.
We found the gradient with respect to w1 and now we should update w1 and calculate w1+.
If we do the same calculations for w2, w3 and w4 we get the following result.
We updated the weight values using the backpropagation algorithm. Using the updated new weights, I made a forward pass and tested whether the algorithm worked. The output value obtained with new weights is 0.64623980051852. The old output value resulting from the first forward pass was 0.37163015135768. The targeted output value is 0.99. As you can see, the model has come a long way in a single epoch. Usually, the number of epochs in model training is very high. As a result of many epochs, there is no doubt that the model will come close to the targeted output value.
Depending on the implementation preferences, biases can also be updated by performing similar calculations. In the same manner, during back pass phase the gradients respect to biases(b1 and b2) should be found and then update calculations should be performed.
P.S. I was searching through the Internet to find something explanatory about the backpropagation algorithm. And actually I came across with a great article on the website below which I highly recommend you to read. It inspired me to do same calculations on a little bit modified neural network structure on my own as a good learning practise.