Dropout Regularization explanation using mathematical approach in Neural Networks

Salonilokeshdutta
3 min readSep 7, 2022

--

When there is neural network with large number of neurons, there is a much higher probability that the multiple number of neurons present in a layer extract the same hidden features or very much correlated features from the previous layer. This can result in the embedding the more significance to some features as compared to others. So, our model will prone to predict that features more which could lead to overfitting. To overcome this issue dropout is used during the NNs implementation.

Dropout is a regularization technique that reduces the overfitting issue in a Neural Networks. The Dropout is added after the dense layer. It should never be used at the final output layer since this can result in loss of required information. The working of the dropout layer is explained below.

As neural networks consist of large no. of neurons, the dropout will set certain neurons as off (0) and other remaining neurons on (1) in a layer at each training step, so that every time the network do not get prone to learning the similar features frequently which reduces the overfitting. The on-off of the neurons are done randomly.

Fig.1 Network without dropout

Considering the simple network consisting of 4 neurons in hidden layer or dense layer and 2 neurons in the next layer named as output1 and output2 and bias 0 in Fig.1

Without dropout: output1=0.5x4=2

Output2=0.5x4=2

Let the dropout rate be 0.5 which means that if there are 10 neurons that 5 neurons will be turned off in each training step. The activated neurons will be multiplied with the value 2 so that overall sum of neuron value remains the same.

As dropout rate is 0.5, out of 4 neurons 2 neurons shown in red will be in inactive state and the value of active neurons will be given as 2 using the below mentioned formula:

Fig. 2 a), b) showing different training sets created using dropout

Training Set a): Output1=Output2=0.5x0 +0.5x2+0.5x0+0.5x2=2

Training Set b): Output1=Output2=0.5x2+0.5x0+0.5x2+0.5x0=2

The output value of each neuron will always be same. This shutting down of different neurons at different training passes will help in reducing problem of co-adaptation meaning adapting or giving significance to the certain features.

Conclusion

Dropout certainly helps in reduce overfitting but should be applied only to the deep neural networks with large number of network layers, but the training time increase with the dropout than the standard network. The dropout rate purely depends upon the hit and trial.

Kindly comment with your queries and suggestions.

Stay Tuned…. for more articles.

--

--

Salonilokeshdutta
Salonilokeshdutta

Written by Salonilokeshdutta

Data Scientist at GM Analytics Solutions

No responses yet