OPTIMIZATION OF THE ERROR ENTROPY MINIMIZATION ALGORITHM FOR NEURAL NETWORK CLASSIFICATION (original) (raw)

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

sparkles

AI

The Error Entropy Minimization (EEM) algorithm leverages Renyi's Quadratic Entropy to optimize neural network classification results. This study introduces the EEM-VLR algorithm, which employs a variable learning rate to enhance performance compared to traditional methods, particularly in classification tasks. Although challenges remain in effectively integrating variable smoothing parameters, early results suggest substantial improvements in accuracy.

Entropy learning and relevance criteria for neural network pruning

International journal of neural systems, 2003

In this paper, entropy is a term used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence an entropy approach is proposed to dampen the early creation of such nodes by using a new computation called entropy cycle. Entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be pruned to reduce the memory requirements of the neural network.

An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems

IEEE Transactions on Signal Processing, 2002

This paper investigates error-entropy-minimization in adaptive systems training. We prove the equivalence between minimization of error's Renyi entropy of order and minimization of a Csiszar distance measure between the densities of desired and system outputs. A nonparametric estimator for Renyi's entropy is presented, and it is shown that the global minimum of this estimator is the same as the actual entropy. The performance of the error-entropy-minimization criterion is compared with mean-square-error-minimization in the short-term prediction of a chaotic time series and in nonlinear system identification.

Conditional entropy minimization in neural network classifiers

We explore the role of entropy manipulation during learning in supervised multiple layer perceptron classifiers. Entropy maximization [1][2] or mutual information maximization [3] is the criterion for unsupervised blind signal separation or feature extraction. In contrast, we show that for a 2-layer MLP classifier, conditional entropy minimization in the internal layer is a necessary condition for error minimization in the mappingfiom the input to the output. The relationship between entropy and the expected volume and mass of a convex hull constructedfiom n sample points is examined. We show that minimizing the expected hull volume may have more desirable gradient dynamics when compared to minimizing entropy. We show that entropy by itseK has some geometrical invariance with respect to expected hull volumes. We develop closed form expressions for the expected convex hull mass and volumes in RI and relate these to error probability. Finally we show that learning in an MLP may be accomplished solq by minimization of the conditional expected hull volumes and the expected volume of the "intensity of collision. ''

An Information-theoretic Learning Algorithm for Neural Network Classification

1995

A new learning algorithm is developed for the design of statistical classifiers minimizing the rate of misclassification. The method, which is based on ideas from information theory and analogies to statistical physics, assigns data to classes in probability. The distributions are chosen to minimize the expected classification error while simultaneously enforcing the classifier's structure and a level of "randomness" measured by Shannon's entropy. Achievement of the classifier structure is quantified by an associated cost. The constrained optimization problem is equivalent to the minimization of a Helmholtz free energy, and the resulting optimization method is a basic extension of the deterministic annealing algorithm that explicitly enforces structural constraints on assignments while reducing the entropy and expected cost with temperature. In the limit of low temperature, the error rate is minimized directly and a hard classifier with the requisite structure is obtained. This learning algorithm can be used to design a variety of classifier structures. The approach is compared with standard methods for radial basis function design and is demonstrated to substantially outperform other design methods on several benchmark examples, while often retaining design complexity comparable to, or only moderately greater than that of strict descent-based methods. 592 D. NnLLER.A.RAO.K. ROSE.A. GERSHO

On the Smoothed Minimum Error Entropy Criterion

Entropy, 2012

Recent studies suggest that the minimum error entropy (MEE) criterion can outperform the traditional mean square error criterion in supervised machine learning, especially in nonlinear and non-Gaussian situations. In practice, however, one has to estimate the error entropy from the samples since in general the analytical evaluation of error entropy is not possible. By the Parzen windowing approach, the estimated error entropy converges asymptotically to the entropy of the error plus an independent random variable whose probability density function (PDF) corresponds to the kernel function in the Parzen method. This quantity of entropy is called the smoothed error entropy, and the corresponding optimality criterion is named the smoothed MEE (SMEE) criterion. In this paper, we study theoretically the SMEE criterion in supervised machine learning where the learning machine is assumed to be nonparametric and universal. Some basic properties are presented. In particular, we show that when the smoothing factor is very small, the smoothed error entropy equals approximately the true error entropy plus a scaled version of the Fisher information of error. We also investigate how the smoothing factor affects the optimal solution. In some special situations, the optimal solution under the SMEE criterion does not change with increasing smoothing factor. In general cases, when the smoothing factor tends to infinity, minimizing the smoothed error entropy will be approximately equivalent to minimizing error variance, regardless of the conditional PDF and the kernel.

Convergence properties and data efficiency of the minimum error entropy criterion in adaline training

IEEE Transactions on Signal Processing, 2003

Recently, we have proposed the minimum error entropy (MEE) criterion as an information theoretic alternative to the widely used mean square error criterion in supervised adaptive system training. For this purpose, we have formulated a nonparametric estimator for Renyi's entropy that employs Parzen windowing. Mathematical investigation of the proposed entropy estimator revealed interesting insights about the process of information theoretical learning. This new estimator and the associated criteria have been applied to the supervised and unsupervised training of adaptive systems in a wide range of problems successfully. In this paper, we analyze the structure of the MEE performance surface around the optimal solution, and we derive the upper bound for the step size in adaptive linear neuron (ADA-LINE) training with the steepest descent algorithm using MEE. In addition, the effects of the entropy order and the kernel size in Parzen windowing on the shape of the performance surface and the eigenvalues of the Hessian at and around the optimal solution are investigated. Conclusions from the theoretical analyses are illustrated through numerical examples.

Entropy Based Comparison of Neural Networks for Classification

Perspectives in Neural Computing, 1998

Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the University of California for the US. Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government pufpnses. The Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Fom, No. 836 R5 SF2629 lolo1 DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liability or respom-b' bility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disdosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

A Method for Estimating the Entropy of Time Series Using Artificial Neural Networks

Entropy

Measuring the predictability and complexity of time series using entropy is essential tool designing and controlling a nonlinear system. However, the existing methods have some drawbacks related to the strong dependence of entropy on the parameters of the methods. To overcome these difficulties, this study proposes a new method for estimating the entropy of a time series using the LogNNet neural network model. The LogNNet reservoir matrix is filled with time series elements according to our algorithm. The accuracy of the classification of images from the MNIST-10 database is considered as the entropy measure and denoted by NNetEn. The novelty of entropy calculation is that the time series is involved in mixing the input information in the reservoir. Greater complexity in the time series leads to a higher classification accuracy and higher NNetEn values. We introduce a new time series characteristic called time series learning inertia that determines the learning rate of the neural n...

Generalized entropy cost function in neural networks

Artificial neural networks are capable of constructing complex decision boundaries and over the recent years they have been widely used in many practical applications ranging from business to medical diagnosis and technical problems. A large number of error functions have been proposed in the literature to achieve a better predictive power. However, only a few works employ Tsallis statistics, which has successfully been applied in other fields. This paper undertakes the effort to examine the í µí±ž-generalized function based on Tsallis statistics as an alternative error measure in neural networks. The results indicate that Tsallis entropy error function can be successfully applied in the neural networks yielding satisfactory results.

Error-Entropy Minimization for Dynamical Systems Modeling

Lecture Notes in Computer Science, 2008

Recent publications have presented many successful usages of elements from information theory in adaptive systems training. Errorentropy has been proven to outperform mean squared error as a cost function in many artificially generated data sets, but still few applications to real world data have been published. In this paper, we design a neural network trained with error-entropy minimization criterion and use it for dynamical systems modeling on artificial as well as real world data. Performance of this neural network is compared against the mean squared error driven approach in terms of computational complexity, parameter optimization, and error probability densities.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (8)

  1. Erdogmus D. and Principe J., 2002, "An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems", Trans. On Signal Processing, Vol. 50, No. 7, pp. 1780-1786.
  2. Jacobs, R., Jordan, M., Nowlan, S. and Hinton, G., 1991, "Adaptive mixtures of local experts", Neural Computation, pp.79-87.
  3. Marques de Sá, J., 2003, Applied statistics using SPSS, STATISTICA and MATLAB, Springer.
  4. Principe J., Fisher J. and Xu D., 1998, "Information-Theoretic Learning", Computational NeuroEngineering Laboratory, University of Florida, Florida.
  5. Renyi A., 1976, "Some Fundamental Questions of Information Theory", Selected Papers of Alfred Renyi, Vol. 2, pp. 526-552.
  6. Santos J., Alexandre L. and Marques de Sá J.,2004, "Neural network Classification using Error Entropy Minimization", submitted to the Int. Conf. on Recent Advances in Soft Computing, Nottingham, United Kingdom.
  7. Silva, F. and Almeida, L., 1990, "Speeding up Backpropagation", Advanced Neural Computers, Eckmiller R. (Editor), pp. 151-158.
  8. Xu D. and Principe J., 1999, "Training MLPs layer-by-layer with the information potential", Intl. Joint Conf. on Neural Networks, pp.1716-1720.

Neural network classification using error entropy minimization

Biological and Artificial Intelligence Environments, 2005

One way of using the entropy criteria in learning systems is to minimize the entropy of the error between two variables: typically, one is the output of the learning system and the other is the target. This framework has been used for regression. In this paper we show how to use the minimization of the entropy of the error for classification. The minimization of the entropy of the error implies a constant value for the errors. This, in general, does not imply that the value of the errors is zero. In regression, this problem is solved by making a shift of the final result such that it's average equals the average value of the desired target. We prove that, under mild conditions, this algorithm, when used in a classification problem, makes the error converge to zero and can thus be used in classification.

New entropy learning method for neural network

IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), 1999

In this paper, an entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that nodes are operating near the extreme values of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value, while relevant nodes remain active in the linear region of the Sigmoid curve. The early creation of redundant nodes may impair generalisation. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed to dampen the early creation of such redundant nodes.

Neural Network Learning Using Entropy Cycle

Knowledge and Information Systems, 2000

In this paper, an additional entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that most nodes are operating in the non-linear zones (i.e. saturation zones) near the extreme ends of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value corresponding to a low entropy state with maximum information, while some relevant nodes remain active in the linear zone. As training progresses, more nodes get into saturation zones. The early creation of such nodes may impair generalisation performance. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed in this paper to dampen the creation of such inactive nodes in the early stage of training. At the end of training, these inactive nodes can then be eliminated without affecting the performance of the original network. The concept has been successfully applied for pruning in two classification problems. The experiments indicate that redundant nodes are pruned resulting in optimal network topologies.

Entropy Learning in Neural Network

ASEAN Journal on Science and Technology for Development

In this paper, entropy term is used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence entropy approach is proposed to dampen the early creation of such nodes. The entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be eliminated to reduce the memory requirements of the neural network.

Entropy minimization algorithm for multilayer perceptrons

We have previously proposed the use of quadratic Renyi's error entropy with a Parzen density estimator with Gaussian kernels as an alternative optimality criterion for supervised neural network training, and showed that it produces better performance on the test data compared to the MSE. The error entropy criterion imposes the minimization of average information content in the error signal rather than simply minimizing the energy as MSE does. Recently, we developed a nonparametric entropy estimator for Renyi's definition that makes possible the use of any entropy order and any suitable kernel function in Parzen density estimation. The new estimator reduces to the previously used estimator for the special choice of Gaussian kernels and quadratic entropy. In this paper, we briefly present the new criterion and how to apply it to MLP training. We also address the issue of global optimization by the control of the kernel size in the Parzen window estimation.

An improved minimum error entropy criterion with self adjusting step-size

2005

In this paper, we propose Minimum Error Entropy with self adjusting step-size (MEE-SAS) as an alternative to the Minimum Error Entropy (MEE) algorithm for training adaptive systems. MEE-SAS has faster speed of convergence as compared to MEE technique for the same misadjustment. We attribute this characteristic to automatic learning rate inherent in MEE-SAS where the changing step size helps the algorithm to take large "jumps" when far away from the optimal solution and small "jumps" when near the solution. We test the performance of both the algorithms for two classic problems of system identification and prediction. However, we show that MEE performs better than MEE-SAS in situations where tracking ability of the optimal solution is required like in the case of non-stationary signals.

Learning Theory Approach to Minimum Error Entropy Criterion

2012

We consider the minimum error entropy (MEE) criterion and an empirical risk minimization learning algorithm when an approximation of Rényi's entropy (of order 2) by Parzen windowing is minimized. This learning algorithm involves a Parzen windowing scaling parameter. We present a learning theory approach for this MEE algorithm in a regression setting when the scaling parameter is large. Consistency and explicit convergence rates are provided in terms of the approximation ability and capacity of the involved hypothesis space. Novel analysis is carried out for the generalization error associated with Rényi's entropy and a Parzen windowing function, to overcome technical difficulties arising from the essential differences between the classical least squares problems and the MEE setting. An involved symmetrized least squares error is introduced and analyzed, which is related to some ranking algorithms.

Adaptive entropy-based learning with dynamic artificial neural network

Neurocomputing, 2019

Entropy models the added information associated to data uncertainty, proving that stochasticity is not purely random. This paper explores the potential improvement of machine learning methodologies through the incorporation of entropy analysis in the learning process. A multi-layer perceptron is applied to identify patterns in previous forecasting errors achieved by a machine learning methodology. The proposed learning approach is adaptive to the training data through a retraining process that includes only the most recent and relevant data, thus excluding misleading information from the training process. The learnt error patterns are then combined with the original forecasting results in order to improve forecasting accuracy, using the Rényi entropy to determine the amount in which the original forecasted value should be adapted considering the learnt error patterns. The proposed approach is combined with eleven different machine learning methodologies, and applied to the forecasting of electricity market prices using real data from the Iberian electricity market operator-OMIE. Results show that through the identification of patterns in the forecasting error, the proposed methodology is able to improve the learning algorithms' forecasting accuracy and reduce the variability of their forecasting errors.

Pseudo-Entropy Based Pruning Algorithm for Feed forward Neural Networks

2013

Design of artificial neural networks is an important and practical task:"how to choose the adequate size of neural architecture for a given application". One popular method to overcome this problem is to start with an oversized structure and then prune it to obtain simpler network with a good generalization performance. This paper presents a pruning algorithm based on pseudo-entropy of hidden neurons. The pruning is performed by iteratively training of network to a certain performance criterion and then removing the hidden neuron with individual pseudo-entropy greater than a preselected threshold which is slightly higher than the average value of network's pseudo-entropy until no one can further be removed. This approach is validated with an academic example and it is tested on induction motor modeling problem. Compared with two modified versions of Optimal Brain Surgeon (OBS) algorithm, the developed method gives interesting results with an easier computation tasks.

A Fixed-Point Minimum Error Entropy Algorithm

2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2006

In this paper, we propose the Fixed-Point Minimum Error Entropy (Fixed-Point MEE) as an alternative to the Minimum Error Entropy (MEE) algorithm for training adaptive systems. The fixed-point algorithms are different from the gradient methods like MEE, and are proven to be faster, more stable and step-size free. This characteristic is due to the second order update similar to Recursive Least-Squares (RLS) that tracks the Wiener solution with every update. We study the effect of design parameters, namely the forgetting factor, the window length, and the kernel size, on the convergence properties of the newly introduced recursive Fixed-Point MEE. Also, we test the performance of both the algorithms for two classic problems of system identification. Finally, we conclude that the Fixed-Point MEE performs better than MEE.