DropNeuron: Simplifying the Structure of Deep Neural Networks (original) (raw)

Regularization of deep neural networks with spectral dropout

Neural Networks

The big breakthrough on the ImageNet challenge in 2012 was partially due to the 'dropout' technique used to avoid overfitting. Here, we introduce a new approach called 'Spectral Dropout' to improve the generalization ability of deep neural networks. We cast the proposed approach in the form of regular Convolutional Neural Network (CNN) weight layers using a decorrelation transform with fixed basis functions. Our spectral dropout method prevents overfitting by eliminating weak and 'noisy' Fourier domain coefficients of the neural network activations, leading to remarkably better results than the current regularization methods. Furthermore, the proposed is very efficient due to the fixed basis functions used for spectral transformation. In particular, compared to Dropout and Drop-Connect, our method significantly speeds up the network convergence rate during the training process (roughly ×2), with considerably higher neuron pruning rates (an increase of ∼ 30%). We demonstrate that the spectral dropout can also be used in conjunction with other regularization approaches resulting in additional performance gains.

DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow

ArXiv, 2016

Modern deep neural networks have a large number of parameters, making them very powerful machine learning systems. A critical issue for training such large networks on large-scale data-sets is to prevent overfitting while at the same time providing enough model capacity. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks. In the first D step, we train a dense network to learn which connections are important. In the S step, we regularize the network by pruning the unimportant connections and retrain the network given the sparsity constraint. In the final D step, we increase the model capacity by freeing the sparsity constraint, re-initializing the pruned parameters, and retraining the whole dense network. Experiments show that DSD training can improve the performance of a wide range of CNN, RNN and LSTMs on the tasks of image classification, caption generation and speech recognition. On the Imagenet dataset, DSD improved the absolute accuracy of...

Learn & drop: fast learning of cnns based on layer dropping

Neural computing & applications, 2024

This paper proposes a new method to improve the training efficiency of deep convolutional neural networks. During training, the method evaluates scores to measure how much each layer's parameters change and whether the layer will continue learning or not. Based on these scores, the network is scaled down such that the number of parameters to be learned is reduced, yielding a speed-up in training. Unlike state-of-the-art methods that try to compress the network to be used in the inference phase or to limit the number of operations performed in the back-propagation phase, the proposed method is novel in that it focuses on reducing the number of operations performed by the network in the forward propagation during training. The proposed training strategy has been validated on two widely used architecture families: VGG and ResNet. Experiments on MNIST, CIFAR-10 and Imagenette show that, with the proposed method, the training time of the models is more than halved without significantly impacting accuracy. The FLOPs reduction in the forward propagation during training ranges from 17.83% for VGG-11 to 83.74% for ResNet-152. As for the accuracy, the impact depends on the depth of the model and the decrease is between 0.26% and 2.38% for VGGs and between 0.4 and 3.2% for ResNets. These results demonstrate the effectiveness of the proposed technique in speeding up learning of CNNs. The technique will be especially useful in applications where fine-tuning or online training of convolutional models is required, for instance because data arrive sequentially.

DropNet: Reducing Neural Network Complexity via Iterative Pruning

2020

Modern deep neural networks require a significant amount of computing time and power to train and deploy, which limits their usage on edge devices. Inspired by the iterative weight pruning in the Lottery Ticket Hypothesis (Frankle & Carbin, 2018), we propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity. DropNet iteratively removes nodes/filters with the lowest average post-activation value across all training samples. Empirically, we show that DropNet is robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets. We show that up to 90% of the nodes/filters can be removed without any significant loss of accuracy. The final pruned network performs well even with reinitialization of the weights and biases. DropNet also has similar accuracy to an oracle which greedily removes nodes/filters one at a time to minimise training loss, highlighting its effectiveness.

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA

Computational Intelligence and Neuroscience

To accelerate the practical applications of artificial intelligence, this paper proposes a high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA). The refined pruning operation is based on the channel-wise importance indexes of each layer and the layer-wise input sparsity of convolutional layers. The method utilizes the characteristics of the native networks without introducing any extra workloads to the training phase. In addition, the operation is easy to be extended to various state-of-the-art deep neural networks. The effectiveness of the method is verified on ResNet architecture and VGG networks in terms of dataset CIFAR10, CIFAR100, and ImageNet100. Experimental results show that in terms of ResNet50 on CIFAR10 and ResNet101 on CIFAR100, more than 85% of parameters and Floating-Point Operations are pruned with only 0.35% and 0.40% accur...

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

2017

We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections i...

IJERT-Studying and Analysing the Effect of Weight Norm Penalties and Dropout as Regularizers for Small Convolutional Neural Networks

International Journal of Engineering Research and Technology (IJERT), 2021

https://www.ijert.org/studying-and-analysing-the-effect-of-weight-norm-penalties-and-dropout-as-regularizers-for-small-convolutional-neural-networks https://www.ijert.org/research/studying-and-analysing-the-effect-of-weight-norm-penalties-and-dropout-as-regularizers-for-small-convolutional-neural-networks-IJERTV10IS010025.pdf A long-standing problem for both machine learning neophytes and researchers has been to create a learning model that performs well not just on seen data points (training data) but also on unseen data points (test data). It is usually the case that a deep model learns and co-adapts representations and features from the training data so well, that it fails to perform effectively on the test data at all. This is known as the problem of overfitting. A lot of research has been devoted to come up with solutions to the problem of overfitting. To address this complication, apprehending the concept of generalization becomes pivotal. Generalization is the ability for a model to perform well on unseen inputs and collectively, all the approaches that work upon decreasing the generalization error of a learning algorithm are called regularization techniques. One way to reduce variance and generalization error in a model is to introduce a penalty term (weight decay) in the cost function that restricts the model's parameters from increasing. Another recently accepted practice is to use dropout, wherein hidden units and visible units are randomly dropped to obtain a much simpler model that prevents it from overfitting. In this paper, an extensive analysis has been done between both these regularizers on the MNIST dataset for relatively small convolutional networks. The findings of this paper assert that with appropriate hyperparameter settings, dropout performed a better job in bringing down the training error and making the gap between training and test error (generalization gap) minimum. I. INTRODUCTION The onset of learning through labeled data has revolutionized several pattern recognition contests, especially in applications of computer vision. Modern Convolutional Neural Networks (CNNs) [1] trained using the backpropagation algorithm [2] are predominantly suited for this task and has achieved state-of-the-art results in several image datasets. These artificial neural networks give unrivalled results on data that can be structured to have a grid-like topology. This is the reason why convolutional networks are chosen for computer vision applications as the feature images can be reduced to form either a 2D or 3D grid of pixels. The success of these modern gradient-based networks is associated with the fact that this type of network architecture makes use of sparse interactions wherein, each convolution operation is carried by a kernel with a size smaller than the input data which enables the network to store fewer parameters and hence reduces memory requirements and enhances efficiency.

Enhancing the Regularization Effect of Weight Pruning in Artificial Neural Networks

arXiv (Cornell University), 2018

Artificial neural networks (ANNs) may not be worth their computational/memory costs when used in mobile phones or embedded devices. Parameter-pruning algorithms combat these costs, with some algorithms capable of removing over 90% of an ANN's weights without harming the ANN's performance. Removing weights from an ANN is a form of regularization, but existing pruning algorithms do not significantly improve generalization error. We show that pruning ANNs can improve generalization if pruning targets large weights instead of small weights. Applying our pruning algorithm to an ANN leads to a higher image classification accuracy on CIFAR-10 data than applying the popular regularizer dropout. The pruning couples this higher accuracy with an 85% reduction of the ANN's parameter count.

Pruning Deep Neural Networks with `0-constrained Optimization

2020

Deep neural networks (DNNs) give state-of-the-art accuracy in many tasks, but they can require large amounts of memory storage, energy consumption, and long inference times. Modern DNNs can have hundreds of million parameters, which make it difficult for DNNs to be deployed in some applications with low-resource environments. Pruning redundant connections without sacrificing accuracy is one of popular approaches to overcome these limitations. We propose two `0-constrained optimization models for pruning deep neural networks layerby-layer. The first model is devoted to a general activation function, while the second one is specifically for a ReLU. We introduce an efficient cutting plane algorithm to solve the latter to optimality. Our experiments show that the proposed approach achieves competitive compression rates over several state-of-the-art baseline methods.

Novel Pruning Techniques in Convolutional-Neural Networks

International Journal of Engineering and Advanced Technology, 2020

Deep Learning allows us to build powerful models to solve problems like image classification, time series prediction, natural language processing, etc. This is achieved at the cost of huge amounts of storage and processing requirements which are sometimes not possible in machines with limited resources. In this paper, we compare different methods which tackle this problem with network pruning. Selected few pruning methodologies from the deep learning literature were implemented to display their results. Modern neural architectures have a combination of different layers like convolutional layers, pooling layers, dense layers, etc. We compare pruning techniques for dense layers (such as unit/neuron pruning, and weight Pruning), and convolutional layers as well (using L1 norm, taylor expansion of loss to determine importance of convolutional filters, and Variable Importance in Projection using Partial Least Squares) for the image classification task. This study aims to ease the overhea...

DropNeuron: Simplifying the Structure of Deep Neural Networks (original) (raw)

Related papers