New entropy learning method for neural network (original) (raw)

Neural Network Learning Using Entropy Cycle

Knowledge and Information Systems, 2000

In this paper, an additional entropy penalty term is used to steer the direction of the hidden node's activation in the process of learning. A state with minimum entropy means that most nodes are operating in the non-linear zones (i.e. saturation zones) near the extreme ends of the Sigmoid curve. As the training proceeds, redundant hidden nodes' activations are pushed towards their extreme value corresponding to a low entropy state with maximum information, while some relevant nodes remain active in the linear zone. As training progresses, more nodes get into saturation zones. The early creation of such nodes may impair generalisation performance. To prevent the network from being driven into saturation before it can really learn, an entropy cycle is proposed in this paper to dampen the creation of such inactive nodes in the early stage of training. At the end of training, these inactive nodes can then be eliminated without affecting the performance of the original network. The concept has been successfully applied for pruning in two classification problems. The experiments indicate that redundant nodes are pruned resulting in optimal network topologies.

Entropy Learning in Neural Network

ASEAN Journal on Science and Technology for Development

In this paper, entropy term is used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence entropy approach is proposed to dampen the early creation of such nodes. The entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be eliminated to reduce the memory requirements of the neural network.

Neural network classification using error entropy minimization

Biological and Artificial Intelligence Environments, 2005

One way of using the entropy criteria in learning systems is to minimize the entropy of the error between two variables: typically, one is the output of the learning system and the other is the target. This framework has been used for regression. In this paper we show how to use the minimization of the entropy of the error for classification. The minimization of the entropy of the error implies a constant value for the errors. This, in general, does not imply that the value of the errors is zero. In regression, this problem is solved by making a shift of the final result such that it's average equals the average value of the desired target. We prove that, under mild conditions, this algorithm, when used in a classification problem, makes the error converge to zero and can thus be used in classification.

Entropy learning and relevance criteria for neural network pruning

International journal of neural systems, 2003

In this paper, entropy is a term used in the learning phase of a neural network. As learning progresses, more hidden nodes get into saturation. The early creation of such hidden nodes may impair generalisation. Hence an entropy approach is proposed to dampen the early creation of such nodes by using a new computation called entropy cycle. Entropy learning also helps to increase the importance of relevant nodes while dampening the less important nodes. At the end of learning, the less important nodes can then be pruned to reduce the memory requirements of the neural network.

Conditional entropy minimization in neural network classifiers

We explore the role of entropy manipulation during learning in supervised multiple layer perceptron classifiers. Entropy maximization [1][2] or mutual information maximization [3] is the criterion for unsupervised blind signal separation or feature extraction. In contrast, we show that for a 2-layer MLP classifier, conditional entropy minimization in the internal layer is a necessary condition for error minimization in the mappingfiom the input to the output. The relationship between entropy and the expected volume and mass of a convex hull constructedfiom n sample points is examined. We show that minimizing the expected hull volume may have more desirable gradient dynamics when compared to minimizing entropy. We show that entropy by itseK has some geometrical invariance with respect to expected hull volumes. We develop closed form expressions for the expected convex hull mass and volumes in RI and relate these to error probability. Finally we show that learning in an MLP may be accomplished solq by minimization of the conditional expected hull volumes and the expected volume of the "intensity of collision. ''

Adaptive entropy-based learning with dynamic artificial neural network

Neurocomputing, 2019

Entropy models the added information associated to data uncertainty, proving that stochasticity is not purely random. This paper explores the potential improvement of machine learning methodologies through the incorporation of entropy analysis in the learning process. A multi-layer perceptron is applied to identify patterns in previous forecasting errors achieved by a machine learning methodology. The proposed learning approach is adaptive to the training data through a retraining process that includes only the most recent and relevant data, thus excluding misleading information from the training process. The learnt error patterns are then combined with the original forecasting results in order to improve forecasting accuracy, using the Rényi entropy to determine the amount in which the original forecasted value should be adapted considering the learnt error patterns. The proposed approach is combined with eleven different machine learning methodologies, and applied to the forecasting of electricity market prices using real data from the Iberian electricity market operator-OMIE. Results show that through the identification of patterns in the forecasting error, the proposed methodology is able to improve the learning algorithms' forecasting accuracy and reduce the variability of their forecasting errors.

Entropy minimization algorithm for multilayer perceptrons

We have previously proposed the use of quadratic Renyi's error entropy with a Parzen density estimator with Gaussian kernels as an alternative optimality criterion for supervised neural network training, and showed that it produces better performance on the test data compared to the MSE. The error entropy criterion imposes the minimization of average information content in the error signal rather than simply minimizing the energy as MSE does. Recently, we developed a nonparametric entropy estimator for Renyi's definition that makes possible the use of any entropy order and any suitable kernel function in Parzen density estimation. The new estimator reduces to the previously used estimator for the special choice of Gaussian kernels and quadratic entropy. In this paper, we briefly present the new criterion and how to apply it to MLP training. We also address the issue of global optimization by the control of the kernel size in the Parzen window estimation.

Self-growing neural network architecture using crisp and fuzzy entropy

The paper briefly describes the self-growing neural network algorithm, CID3, which makes decision trees equivalent to hidden layers of a neural network. The algorithm generates a feedforward architecture using crisp and fuzzy entropy measures. The results of a real-life recognition problem of distinguishing defects in a glass ribbon and of a benchmark problem of differentiating two spirals are shown and discussed.

Self-growing neural network architecture using crisp and fuzzy entropy

Science of Artificial Neural Networks, 1992

The paper briefly describes the self-growing neural network algorithm, CID3, which makes decision trees equivalent to hidden layers of a neural network. The algorithm generates a feedforward architecture using crisp and fuzzy entropy measures. The results of a real-life recognition problem of distinguishing defects in a glass ribbon and of a benchmark problem of differentiating two spirals are shown and discussed.