Augmented efficient BackProp for backpropagation learning in deep autoassociative neural networks (original) (raw)
An augmented efficient backpropagation training strategy for deep autoassociative neural networks
2010
We introduce Augmented Efficient BackProp, a strategy for applying the backpropagation algorithm to deep autoencoders, i.e. autoassociators with many hidden layers, without relying on a weight initialization using restricted Boltzmann machines (RBMs). This training method, benchmarked on three different types of application datasets, is an extension of Efficient BackProp, first proposed by LeCun et al. [12].
IEEE Transactions on Systems, Man, and Cybernetics: Systems
A general deep learning (DL) mechanism for a multiple hidden layer feed-forward neural network contains two parts, i.e., 1) an unsupervised greedy layer-wise training and 2) a supervised fine-tuning which is usually an iterative process. Although this mechanism has been demonstrated in many fields to be able to significantly improve the generalization of neural network, there is no clear evidence to show which one of the two parts plays the essential role for the generalization improvement, resulting in an argument within the DL community. Focusing on this argument, this paper proposes a new DL approach to train multilayer feed-forward neural networks. This approach uses restricted Boltzmann machine (RBM) as the layer-wise training and uses the generalized inverse of a matrix as the supervised fine-tuning. Different from the general deep training mechanism like back-propagation (BP), the proposed approach does not need to iteratively tune the weights, and therefore, has many advantages such as quick training, better generalization, and high understandability, etc. Experimentally, the proposed approach demonstrates an excellent performance in comparison with BP-based DL and the traditional training method for multilayer random weight neural networks. To a great extent, this paper demonstrates that the supervised part plays a more important role than the unsupervised part in DL, which provides some new viewpoints for exploring the essence of DL.
The nature of unsupervised learning in deep neural networks: A new understanding and novel approach
Optical Memory and Neural Networks, 2016
⎯Over the last decade, the deep neural networks are a hot topic in machine learning. It is breakthrough technology in processing images, video, speech, text and audio. Deep neural network permits us to overcome some limitations of a shallow neural network due to its deep architecture. In this paper we investigate the nature of unsupervised learning in restricted Boltzmann machine. We have proved that maximization of the log-likelihood input data distribution of restricted Boltzmann machine is equivalent to minimizing the cross-entropy and to special case of minimizing the mean squared error. Thus the nature of unsupervised learning is invariant to different training criteria. As a result we propose a new technique called "REBA" for the unsupervised training of deep neural networks. In contrast to Hinton's conventional approach to the learning of restricted Boltzmann machine, which is based on linear nature of training rule, the proposed technique is founded on nonlinear training rule. We have shown that the classical equations for RBM learning are a special case of the proposed technique. As a result the proposed approach is more universal in contrast to the traditional energy-based model. We demonstrate the performance of the REBA technique using wellknown benchmark problem. The main contribution of this paper is a novel view and new understanding of an unsupervised learning in deep neural networks.
Hybrid Contractive Auto-encoder with Restricted Boltzmann Machine For Multiclass Classification
Arabian Journal for Science and Engineering, 2021
Contractive auto-encoder (CAE) is a type of auto-encoders and a deep learning algorithm that is based on multilayer training approach. It is considered as one of the most powerful, efficient and robust classification techniques, more specifically feature reduction. The problem independence, easy implementation and intelligence of solving sophisticated problems make it distinct from other deep learning approaches. However, CAE fails in data dimensionality reduction that cause difficulty to capture the useful information within the features space. In order to resolve the issues of CAE, restricted Boltzmann machine (RBM) layers have been integrated with CAE to enhance the dimensionality reduction and a randomized factor for hidden layer parameters. The proposed model has been evaluated on four benchmark variant datasets of MNIST. The results have been compared with four well-known multiclass class classification approaches including standard CAE, RBM, AlexNet and artificial neural network. A considerable amount of improvement has been observed in the performance of proposed model as compared to other classification techniques. The proposed CAE-RBM showed an improvement of 2-4% on MNIST(basic), 9-12% for MNIST(rot), 7-12% for MNIST(bg-rand) and 7-10% for MNIST(bg-img) dataset in term of final accuracy.
Supervised and unsupervised training of deep autoencoder
2017
2017 Fall.Includes bibliographical references.Deep learning has proven to be a very useful approach to learn complex data. Recent research in the fields of speech recognition, visual object recognition, natural language processing shows that deep generative models, which contain many layers of latent features, can learn complex data very efficiently. An autoencoder neural network with multiple layers can be used as a deep network to learn complex patterns in data. As training a multiple layer neural network is time consuming, a pre-training step has been employed to initialize the weights of a deep network to speed up the training process. In the pre-training step, each layer is trained individually and the output of each layer is wired to the input of the successive layers. After the pre-training, all the layers are stacked together to form the deep network, and then post training, also known as fine tuning, is done on the whole network to further improve the solution. The aforemen...
2009
We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and dataindependent expectations are approximated using persistent Markov chains. The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer "pre-training" phase that allows variational inference to be initialized with a single bottomup pass. We present results on the MNIST and NORB datasets showing that deep Boltzmann machines learn good generative models and perform well on handwritten digit and visual object recognition tasks.
Biological Cybernetics
In Bourlard and Kamp (Biol Cybern 59(4):291–294, 1998), it was theoretically proven that autoencoders (AE) with single hidden layer (previously called “auto-associative multilayer perceptrons”) were, in the best case, implementing singular value decomposition (SVD) Golub and Reinsch (Linear algebra, Singular value decomposition and least squares solutions, pp 134–151. Springer, 1971), equivalent to principal component analysis (PCA) Hotelling (Educ Psychol 24(6/7):417–441, 1993); Jolliffe (Principal component analysis, springer series in statistics, 2nd edn. Springer, New York ). That is, AE are able to derive the eigenvalues that represent the amount of variance covered by each component even with the presence of the nonlinear function (sigmoid-like, or any other nonlinear functions) present on their hidden units. Today, with the renewed interest in “deep neural networks” (DNN), multiple types of (deep) AE are being investigated as an alternative to manifold learning Cayton (Univ C...
Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines
Boltzmann machines are often used as building blocks in greedy learning of deep networks. However, training even a simplified model, known as restricted Boltzmann machine (RBM), can be extremely laborious: Traditional learning algorithms often converge only with the right choice of the learning rate scheduling and the scale of the initial weights. They are also sensitive to specific data representation: An equivalent RBM can be obtained by flipping some bits and changing the weights and biases accordingly, but traditional learning rules are not invariant to such transformations. Without careful tuning of these training settings, traditional algorithms can easily get stuck at plateaus or even diverge. In this work, we present an enhanced gradient which is derived such that it is invariant to bitflipping transformations. We also propose a way to automatically adjust the learning rate by maximizing a local likelihood estimate. Our experiments confirm that the proposed improvements yield more stable training of RBMs.
A Novel Framework Using Deep Auto-Encoders Based Linear Model for Data Classification
Sensors
This paper proposes a novel data classification framework, combining sparse auto-encoders (SAEs) and a post-processing system consisting of a linear system model relying on Particle Swarm Optimization (PSO) algorithm. All the sensitive and high-level features are extracted by using the first auto-encoder which is wired to the second auto-encoder, followed by a Softmax function layer to classify the extracted features obtained from the second layer. The two auto-encoders and the Softmax classifier are stacked in order to be trained in a supervised approach using the well-known backpropagation algorithm to enhance the performance of the neural network. Afterwards, the linear model transforms the calculated output of the deep stacked sparse auto-encoder to a value close to the anticipated output. This simple transformation increases the overall data classification performance of the stacked sparse auto-encoder architecture. The PSO algorithm allows the estimation of the parameters of t...
A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines
Lecture Notes in Computer Science, 2013
A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximumlikelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.