Storage capacity and learning algorithms for two-layer neural networks (original) (raw)

A two-layer feedforward network of McCulloch-Pitts neurons with Xinputs and E hidden units is analyzed for N~~and E finite with respect to its ability to implement p = aN random input-output relations. Special emphasis is put on the case where all hidden units are coupled to the output with the same strength (committee machine) and the receptive fields of the hidden units either enclose all input units (fully connected) or are nonoverlapping (tree structure). The storage capacity is determined generalizing Gardner's treatment [J. Phys. A 21, 257 (1988); Europhys. Lett. 4, 481 (1987)] of the single-layer perceptron. For the treelike architecture, a replica-symmetric calculation yields a,~& E for a large number E of hidden units. This result violates an upper bound derived by Mitchison and Durbin [Biol. Cybern. 60, 345 (1989)]. One-step replica-symmetry breaking gives lower values of a,. In the fully connected committee machine there are in general correlations among different hidden units. As the limit of capacity is approached, the hidden units are anticorrelated: One hidden unit attempts to learn those patterns which have not been learned by the others. These correlations decrease as 1/E, so that for E~ao the capacity per synapse is the same as for the tree architecture, whereas for small E we find a considerable enhancement for the storage per synapse. Numerical simulations were performed to explicitly construct solutions for the tree as well as the fully connected architecture. A learning algorithm is suggested. It is based on the least-action algorithm, which is modified to take advantage of the two-layer structure. The numerical simulations yield capacities p that are slightly more than twice the number of degrees of freedom, while the fully connected net can store relatively more patterns than the tree. Various generalizations are discussed. Variable weights from hidden to output give the same results for the storage capacity as does the committee machine, as long as E =0(l). %'e furthermore show that thresholds at the hidden units or the output unit cannot increase the capacity, as long as random unbiased patterns are considered. Finally we indicate how to generalize our results to other Boolean functions.