Storage capacity and learning algorithms for two-layer neural networks (original) (raw)
Related papers
Generalization and capacity of extensively large two-layered perceptrons
Physical Review E, 2002
The generalization ability and storage capacity of a treelike two-layered neural network with a number of hidden units scaling as the input dimension is examined. The mapping from the input to the hidden layer is via Boolean functions; the mapping from the hidden layer to the output is done by a perceptron. The analysis is within the replica framework where an order parameter characterizing the overlap between two networks in the combined space of Boolean functions and hidden-to-output couplings is introduced. The maximal capacity of such networks is found to scale linearly with the logarithm of the number of Boolean functions per hidden unit. The generalization process exhibits a first-order phase transition from poor to perfect learning for the case of discrete hidden-to-output couplings. The critical number of examples per input dimension, ␣ c , at which the transition occurs, again scales linearly with the logarithm of the number of Boolean functions. In the case of continuous hidden-to-output couplings, the generalization error decreases according to the same power law as for the perceptron, with the prefactor being different.
On the capabilities of multilayer perceptrons
Journal of Complexity, 1988
What is the smallest multilayer perceptron able to compute arbitrary and random functions? Previous results show that a net with one hidden layer containing N-1 threshold units is capable of implementing an arbitrary dichotomy of N points. A construction is presented here for implementing an arbitrary dichotomy with one hidden layer containing IN/d1 units, for any set of N points in general position in d dimensions. This is in fact the smallest such net as dichotomies which cannot be implemented by any net with fewer units are described. Several constructions are presented of one-hidden-layer nets implementing arbitrary functions into the e-dimensional hypercube. One of these has only 14Nldllel [log2(Nld)jl units in its hidden layer. Arguments based on a function counting theorem of Cover establish that any net implementing arbitrary functions must have at least Nellog,(N) weights, so that no net with one hidden layer containing less than Ne/(d log*(N)) units will suffice. Simple counts also show that if the weights are only allowed to assume one of n, possible values, no net with fewer than Nellog,(n,) weights will suffice. Thus the gain coming from using real valued synapses appears to be only logarithmic. The circuit implementing functions into the e hypercube realizes such logarithmic gains. Since the counting arguments limit below only the number of weights, the possibility is suggested that, if suitable restrictions are imposed on the input vector set to avoid topological obstructions, two-hidden-layer nets with O(N) weights but only 0(X6) threshold units might suffice for arbitrary dichotomies. Interesting and potentially sufficient restrictions include (a) if the vectors are binary, i.e., lie on the d hypercube or (b) if they are randomly and uniformly selected from a bounded region. %I 1988 Academic press, hc.
Constructive proof of efficient pattern storage in the multi-layer perceptron
1993
In this paper, we show that the pattern storage capability of the Gabor polynomial is much higher than the commonly used lower bound on multi-layer perceptron(MLP) pattern storage. We also show that multi-layer perceptron networks having second and third degree polynomial activations can be constructed which efficiently implement Gabor polynomials and therefore have the same high pattern storage capability. The polynomial networks can be mapped to conventional sigmoidal MLPs having the same efficiency. It is shown that training techniques like output weight optimization and conjugate gradient attain only the lower bound of pattern storage. Certainly they are not the final solutions to the MLP training problem.
On the capacity of neural networks
University of Trieste - arXiv, 2022
The aim of this thesis is to compare the capacity of different models of neural networks. We start by analysing the problem solving capacity of a single perceptron using a simple combinatorial argument. After some observations on the storage capacity of a basic network, known as an associative memory, we introduce a powerful statistical mechanical approach to calculate its capacity in the training rule-dependent Hopfield model. With the aim of finding a more general definition that can be applied even to quantum neural nets, we then follow Gardner's work, which let us get rid of the dependency on the training rule, and comment the results obtained by Lewenstein et al. by applying Gardner's methods on a recently proposed quantum perceptron model.
Systems and Computers in Japan, 1996
This paper investigates the influence of noises added to hidden units of multilayer perceptrons. It is shown that a skeletal structure of the network emerges when independent Gaussian noises are added to inputs of hidden units during the error backpropagation learning. By analyzing the average behavior of the backpropagation learning to such noises, it is shown that the weights from hidden units to output units tend to be small and outputs of hidden units tend to be 0 or 1. This means that the network is automatically structurized by adding the noises. As the result, it is expected that the generalization ability of the network is improved. This network structurization was confirmed by experiments of pattern classification and logic Boolean function learning.
Upper bound on pattern storage in feedforward networks
Neurocomputing, 2008
Starting from the strict interpolation equations for multivariate polynomials, an upper bound is developed for the number of patterns that can be memorized by a nonlinear feedforward network. A straightforward proof by contradiction is presented for the upper bound. It is shown that the hidden activations do not have to be analytic. Networks, trained by conjugate gradient, are used to demonstrate the tightness of the bound for random patterns. Based upon the upper bound, small multilayer perceptron models are successfully demonstrated for large support vector machines.
Computational capabilities of multilayer committee machines
Journal of Physics A: Mathematical and Theoretical, 2010
We obtained an analytical expression for the computational complexity of many layered committee machines with a finite number of hidden layers (L < ∞) using the generalization complexity measure introduced by Franco et al (2006) IEEE Trans. Neural Netw. 17 578. Although our result is valid in the large-size limit and for an overlap synaptic matrix that is ultrametric, it provides a useful tool for inferring the appropriate architecture a network must have to reproduce an arbitrary realizable Boolean function.
Learning by Minimizing Resources in Neural Networks
1989
We reformulate the problem of supervised learning in neu ral nets to include the search for a network with minimal resources . The information processing in feedforward networks is described in geometrical terms as the partitioning of the space of possible input configurations by hyperplanes corresponding to hidden units. Regu lar partitionings introduced here are a special class of partitionings. Corresponding architectures can represent any Boolean function using a single layer of hidden units whose number depends on the specific symmetries of the function. Accordin gly, a new class of pla ne-cutting algorithms is proposed that const ruct in polynomial time a "custom made" architecture implementing the desired set of inputj'ouput ex amples . We report the results of our experiments on the storage and rule-extraction abilities of three-layer perceptrons synthetized by a simple greedy algorithm. As expected, simple neuronal structures with good generalization prope...
Equivalence between learning in noisy perceptrons and tree committee machines
1996
We study learning from single presentation of examples ͑on-line learning͒ in single-layer perceptrons and tree committee machines ͑TCMs͒. Lower bounds for the perceptron generalization error as a function of the noise level ⑀ in the teacher output are calculated. We find that local learning in a TCM with K hidden units is simply related to learning in a simple perceptron with a corresponding noise level ⑀(K). For a large number of examples and finite K the generalization error decays as ␣ CM Ϫ1 , where ␣ CM is the number of examples per adjustable weight in the TCM. We also show that on-line learning is possible even in the K→ϱ limit, but with the generalization error decaying as ␣ CM Ϫ1/2. The simple Hebb rule can also be applied to the TCM, but now the error decays as ␣ CM Ϫ1/2 for finite K and ␣ CM Ϫ1/4 for K→ϱ. Exponential decay of the generalization error in both the noisy perceptron learning and in the TCM is obtained by using the learning by queries strategy.
A SURVEY: RESEARCH SUMMARY ON NEURAL NETWORKS
Neural Networks are relatively crude electronic models based on the neural structure of the brain. The brain basically learns from experience. It is natural proof that some problems that are beyond the scope of current computers are indeed solvable by small energy efficient packages.In this paper we propose the fundamentals of neural network topologies, activation function and learning algorithms based on the flow of information in bi-direction or uni-directions. We outline themain features of a number of popular neural networks and provide an overview on their topologies and their learning capabilities.