Memory capacity of neural networks learning within bounds (original) (raw)

On the memory properties of recurrent neural models

2017 International Joint Conference on Neural Networks (IJCNN)

In this paper, we investigate the memory properties of two popular gated units: long short term memory (LSTM) and gated recurrent units (GRU), which have been used in recurrent neural networks (RNN) to achieve state-of-the-art performance on several machine learning tasks. We propose five basic tasks for isolating and examining specific capabilities relating to the implementation of memory. Results show that (i) both types of gated unit perform less reliably than standard RNN units on tasks testing fixed delay recall, (ii) the reliability of stochastic gradient descent decreases as network complexity increases, and (iii) gated units are found to perform better than standard RNNs on tasks that require values to be stored in memory and updated conditionally upon input to the network. Task performance is found to be surprisingly independent of network depth (number of layers) and connection architecture. Finally, visualisations of the solutions found by these networks are presented and explored, exposing for the first time how logic operations are implemented by individual gated cells and small groups of these cells.

On the capacity of neural networks

University of Trieste - arXiv, 2022

The aim of this thesis is to compare the capacity of different models of neural networks. We start by analysing the problem solving capacity of a single perceptron using a simple combinatorial argument. After some observations on the storage capacity of a basic network, known as an associative memory, we introduce a powerful statistical mechanical approach to calculate its capacity in the training rule-dependent Hopfield model. With the aim of finding a more general definition that can be applied even to quantum neural nets, we then follow Gardner's work, which let us get rid of the dependency on the training rule, and comment the results obtained by Lewenstein et al. by applying Gardner's methods on a recently proposed quantum perceptron model.

Memory Capacity for Sequences in a Recurrent Network with Biological Constraints

Neural Computation, 2006

The CA3 region of the hippocampus is a recurrent neural network that is essential for the storage and replay of sequences of patterns that represent behavioral events. Here we present a theoretical framework to calculate a sparsely connected network's capacity to store such sequences. As in CA3, only a limited subset of neurons in the network is active at any one time, pattern retrieval is subject to error, and the resources for plasticity are limited.

Analysis of Short Term Memories for Neural Networks

Short term memory is indispensable for the processing of time varying information with artificial neural networks. In this paper a model for linear memories is presented, and ways to include memories in connectionist topologies are discussed. A comparison is drawn among different memory types, with indication of what is the salient characteristic of each memory model.

Neural networks with small weights implement finite memory machines

2002

Recent experimental studies indicate that recurrent networks initialized with 'small' weights are inherently biased towards finite memory machines . This paper establishes a theoretical counterpart: we prove that recurrent networks with small weights or contractive transition function, respectively, can be approximated arbitrarily well on input sequences of unbounded length by a finite memory machine. Conversely, every finite memory machine can be simulated by a recurrent network with contractive transition function. Hence initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias has benefits from the point of view of statistical learning theory: it emphasizes regions of the weight space where good generalization can be expected. It is well known that standard recurrent neural networks are not distribution independent learnable in the PAC sense. We prove that recurrent networks with contractive transition function with a fixed parameter of the contraction fulfill the so-called distribution independent UCED property and hence are distribution independent PAC-learnable unlike general recurrent networks.

Storage capacity and learning algorithms for two-layer neural networks

Physical Review A, 1992

A two-layer feedforward network of McCulloch-Pitts neurons with Xinputs and E hidden units is analyzed for N~~and E finite with respect to its ability to implement p = aN random input-output relations. Special emphasis is put on the case where all hidden units are coupled to the output with the same strength (committee machine) and the receptive fields of the hidden units either enclose all input units (fully connected) or are nonoverlapping (tree structure). The storage capacity is determined generalizing Gardner's treatment [J. Phys. A 21, 257 (1988); Europhys. Lett. 4, 481 (1987)] of the single-layer perceptron. For the treelike architecture, a replica-symmetric calculation yields a,~& E for a large number E of hidden units. This result violates an upper bound derived by Mitchison and Durbin [Biol. Cybern. 60, 345 (1989)]. One-step replica-symmetry breaking gives lower values of a,. In the fully connected committee machine there are in general correlations among different hidden units. As the limit of capacity is approached, the hidden units are anticorrelated: One hidden unit attempts to learn those patterns which have not been learned by the others. These correlations decrease as 1/E, so that for E~ao the capacity per synapse is the same as for the tree architecture, whereas for small E we find a considerable enhancement for the storage per synapse. Numerical simulations were performed to explicitly construct solutions for the tree as well as the fully connected architecture. A learning algorithm is suggested. It is based on the least-action algorithm, which is modified to take advantage of the two-layer structure. The numerical simulations yield capacities p that are slightly more than twice the number of degrees of freedom, while the fully connected net can store relatively more patterns than the tree. Various generalizations are discussed. Variable weights from hidden to output give the same results for the storage capacity as does the committee machine, as long as E =0(l). %'e furthermore show that thresholds at the hidden units or the output unit cannot increase the capacity, as long as random unbiased patterns are considered. Finally we indicate how to generalize our results to other Boolean functions.

Memory capacity in neural networks with spatial correlations between attractors

Physica A: Statistical Mechanics and its Applications, 1999

We consider the neural network model proposed to describe neurophysiological experiments in which structurally uncorrelated patterns are converted into spatially correlated attractors. For such a network of N neurons and for values of the coupling constant a between succeeding patterns, taken in the interval [0; 1 2 ), we prove the existence of a threshold storage capacity c(a) such that there exists a local minima (in the energy function) near each of the M (N ) (¡ c(a)N ) stimuli patterns. c 1999 Elsevier Science B.V. All rights reserved.

Microscopic reasoning for the non-linear stochastic models of long-range memory

2011

We extend Kirman's model by introducing variable event time scale. The proposed flexible time scale is equivalent to the variable trading activity observed in financial markets. Stochastic version of the extended Kirman's agent based model is compared to the non-linear stochastic models of long-range memory in financial markets. Agent based model providing matching macroscopic description serves as a microscopic reasoning of the earlier proposed stochastic model exhibiting power law statistics.

Storage capacity and retrieval time of small-world neural networks

Physical Review E, 2007

To understand the influence of structure on the function of neural networks, we study the storage capacity and the retrieval time of Hopfield-type neural networks for four network structures: regular, small world, random networks generated by the Watts-Strogatz ͑WS͒ model, and the same network as the neural network of the nematode Caenorhabditis elegans. Using computer simulations, we find that ͑1͒ as the randomness of network is increased, its storage capacity is enhanced; ͑2͒ the retrieval time of WS networks does not depend on the network structure, but the retrieval time of C. elegans's neural network is longer than that of WS networks; ͑3͒ the storage capacity of the C. elegans network is smaller than that of networks generated by the WS model, though the neural network of C. elegans is considered to be a small-world network.

Upper bound on pattern storage in feedforward networks

Neurocomputing, 2008

Starting from the strict interpolation equations for multivariate polynomials, an upper bound is developed for the number of patterns that can be memorized by a nonlinear feedforward network. A straightforward proof by contradiction is presented for the upper bound. It is shown that the hidden activations do not have to be analytic. Networks, trained by conjugate gradient, are used to demonstrate the tightness of the bound for random patterns. Based upon the upper bound, small multilayer perceptron models are successfully demonstrated for large support vector machines.

Memory capacity of neural networks learning within bounds (original) (raw)

Related papers