Handling dropout probability estimation in convolution neural networks using meta-heuristics (original) (raw)

Dropout Probability Estimation in Convolutional Neural Networks by the Enhanced Bat Algorithm

2020 International Joint Conference on Neural Networks (IJCNN)

In recent years, deep learning has reached exceptional accomplishment in diverse applications, such as visual and speech recognition, natural language processing. The convolutional neural network represents a particular type of neural network commonly used for the task of digital image classification. A common issue in deep neural network models is the high variance problem, or also called over-fitting. Overfitting occurs when the model fits well with the training data and fails to generalize on new data. To prevent over-fitting, several regularization methods can be used; one such powerful method is the dropout regularization. To find the optimal value of the dropout rate is a very time-consuming process; hence, we propose a model to find the optimal value by utilizing a metaheuristic algorithm instead of a manual search. In this paper, we propose a hybridized bat algorithm to find the optimal dropout probability rate in a convolutional neural network and compare the results to similar techniques. The experimental results show that the proposed hybrid method overperforms other metaheuristic techniques.

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem.

Adaptive dropout for training deep neural networks

Recently, it was shown that deep neural networks can perform very well if the activities of hidden units are regularized during learning, e.g, by randomly dropping out 50% of their activities. We describe a method called 'standout' in which a binary belief network is overlaid on a neural network and is used to regularize of its hidden units by selectively setting activities to zero. This 'adaptive dropout network' can be trained jointly with the neural network by approximately computing local expectations of binary dropout variables, computing derivatives using back-propagation, and using stochastic gradient descent. Interestingly, experiments show that the learnt dropout network parameters recapitulate the neural network parameters, suggesting that a good dropout network regularizes activities according to magnitude. When evaluated on the MNIST and NORB datasets, we found that our method achieves lower classification error rates than other feature learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines. For example, our method achieves 0.80% and 5.8% errors on the MNIST and NORB test sets, which is better than state-of-the-art results obtained using feature learning methods, including those that use convolutional architectures.

Analysis of Dropout in ANN using MNIST Dataset

2021

The concept of Neural Networks is propelled by the neurons within the human brain and researchers needed a machine to imitate the same process. A Neural Network (NN) is a circuit of connected neurons, or in a present-day sense, an artificial neural network, composed of artificial neurons constructed for solving the artificial intelligence problems. In the deep neural network, Overfitting is a severe issue. This issue could be caused by unbalanced datasets and incorrect model parameter initialization, which causes the model to adhere too closely to the training data and reduces the model's generalization performance for unknown data. To overcome such problems, Regularization techniques are used. This technique modifies the learning algorithm in a way that increases the model's generalization and performance. Dropout is one such regularization technique for addressing the overfitting problem. During the training it randomly drops the hidden units or neurons to prevent the unit...

On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks

ArXiv, 2020

We examine Dropout through the perspective of interactions: learned effects that combine multiple input variables. Given NNN variables, there are O(N2)O(N^2)O(N2) possible pairwise interactions, O(N3)O(N^3)O(N3) possible 3-way interactions, etc. We show that Dropout implicitly sets a learning rate for interaction effects that decays exponentially with the size of the interaction, corresponding to a regularizer that balances against the hypothesis space which grows exponentially with number of variables in the interaction. This understanding of Dropout has implications for the optimal Dropout rate: higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions. This perspective also issues caution against using Dropout to measure term saliency because Dropout regularizes against terms for high-order interactions. Finally, this view of Dropout as a regularizer of interaction effects provides insight into the varying effectiveness of Dropout for diffe...

Dropout, a basic and effective regularization method for a deep learning model: a case study

Indonesian Journal of Electrical Engineering and Computer Science

Deep learning is based on a network of artificial neurons inspired by the human brain. This network is made up of tens or even hundreds of "layers" of neurons. The fields of application of deep learning are indeed multiple; Agriculture is one of those fields in which deep learning is used in various agricultural problems (disease detection, pest detection, and weed identification). A major problem with deep learning is how to create a model that works well, not only on the learning set but also on the validation set. Many approaches used in neural networks are explicitly designed to reduce overfit, possibly at the expense of increasing validation accuracy and training accuracy. In this paper, a basic technique (dropout) is proposed to minimize overfit, we integrated it into a convolutional neural network model to classify weed species and see how it impacts performance, a complementary solution (exponential linear units) are proposed to optimize the obtained results. The r...

An empirical analysis of dropout in piecewise linear networks

The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters. In this work we empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function. We investigate the quality of the test time weight-scaling inference procedure by evaluating the geometric average exactly in small models, as well as compare the performance of the geometric mean to the arithmetic mean more commonly employed by ensemble techniques. We explore the effect of tied weights on the ensemble interpretation by training ensembles of masked networks without tied weights. Finally, we investigate an alternative criterion based on a biased estimator of the maximum likelihood ensemble gradient.

On the Inductive Bias of Dropout

2014

Dropout is a simple but effective technique for learning in neural networks and other settings. However, a sound theoretical understanding of dropout is needed to determine when dropout should be applied and how to use it most effectively. In this paper we continue the exploration of dropout as a regularizer pioneered by Wager, et.al. We focus on linear classification where a convex proxy to the misclassification loss (i.e. the logistic loss used in logistic regression) is minimized. We show:

Handling dropout probability estimation in convolution neural networks using meta-heuristics (original) (raw)

Related papers