Syntactic systematicity in sentence processing with a recurrent self-organizing network (original) (raw)

Systematicity in sentence processing with a recursive self-organizing neural network

… of the 15th European Symposium on …, 2007

As potential candidates for human cognition, connectionist models of sentence processing must learn to behave systematically by generalizing from a small traning set. It was recently shown that Elman networks and, to a greater extent, echo state networks (ESN) possess limited ability to generalize in artificial language learning tasks. We study this capacity for the recently introduced recursive self-organizing neural network model and show that its performance is comparable with ESNs.

Learn more by training less: Systematicity in sentence processing by recurrent networks

Connectionist models of sentence processing must learn to behave systematically by generalizing from a small training set. To what extent recurrent neural networks manage this generalization task is investigated. In contrast to Van der Velde et al. (2004), it is found that simple recurrent networks do show so-called weak combinatorial systematicity, although their performance remains limited. It is argued that these limitations arise from overfitting in large networks. Generalization can be improved by increasing the size of the recurrent layer without training its connections, thereby combining a large short-term memory with a small long-term memory capacity. Performance can be improved further by increasing the number of word types in the training set.

Recurrent networks and natural language: exploiting self-organization

Proc. of CogSci, 2006

Prediction is believed to be an important cognitive component in natural language processing. Within connectionist approaches, Elman's simple recurrent network has been used for this task with considerable success, especially on small scale problems. However, it has been appreciated for some time that supervised gradientbased learning models have difficulties with scaling up, because their learning becomes very time-consuming for larger data sets. In this paper, we explore an alternative neural network architecture that exploits selforganization. The prediction task is effectively split into separate stages of self-organized context representation and subsequent association with the next-word target distribution. We compare various prediction models and show, in the task of learning a language generated by stochastic context-free grammar, that self-organization can lead to higher accuracy, faster training, greater robustness and more transparent internal representations, when compared to Elman's network. ¡

Strong systematicity in sentence processing by simple recurrent networks

Providing explanations of language comprehension requires models that describe language processing and display strong systematicity. Although various extensions of connectionist models have been suggested in order to account for this phenomenon, we found that even a simple recurrent network that had been trained in a way that can be considered 'standard', could display strong systematicity. This finding was investigated in further detail by looking at the internal word representations of the best performing network.

Strong systematicity in sentence processing by an Echo State Network

For neural networks to be considered as realistic models of human linguistic behavior, they must be able to display the level of systematicity that is present in language. This paper investigates the systematic capacities of a sentence-processing Echo State Network. The network is trained on sentences in which particular nouns occur only as subjects and others only as objects. It is then tested on novel sentences in which these roles are reversed. Results show that the network displays so-called strong systematicity.

Self-organizing word representations for fast sentence processing

Several psycholinguistic models represent words as vectors in a high-dimensional state space, such that distances between vectors encode the strengths of paradigmatic relations between the represented words. This chapter argues that such an organization develops because it facilitates fast sentence processing. A model is presented in which sentences, in the form of word-vector sequences, serve as input to a recurrent neural network that provides random dynamics. The word vectors are adjusted by a process of self-organization, aimed at reducing fluctuations in the dynamics. As it turns out, the resulting word vectors are organized paradigmatically. Keywords: Word representation; Sentence processing; Self-organization; Recurrent neural network; Reservoir computing.

Organization of the state space of a simple recurrent network before and after training on recursive linguistic structures

Neural Networks, 2007

ABSTRACT. We study possible neural mechanisms of language processing and aid in development of artificial language processing systems. We used data sets containing recursive linguistic structures and trained the Elman simple recurrent network (SRN) for the next-symbol prediction task. Concentrating on neuron activation clusters in the recurrent layer of SRN we investigate the network state space organization before and after training. Given a SRN and a training stream, we construct predictive models, called neural prediction machines, that directly employ the state space dynamics of the network. We demonstrate two important properties of representations of recursive symbol series in the SRN. First, the clusters of recurrent activations emerging before training are meaningful and correspond to Markov prediction contexts. We show that prediction states that naturally arise in the SRN initialized with small random weights approximately correspond to states of Variable Memory Length Markov Models (VLMM) based on individual symbols (i.e. words). Second, we demonstrate that during training, the SRN reorganizes its state space according to word categories and their grammatical subcategories, and the next-symbol prediction is again based on the VLMM strategy. However, after training, the prediction is based on word categories and their grammatical subcategories rather than individual words. Our conclusion holds for small depths of recursions that are comparable to human performances. The methods of SRN training and analysis of its state space introduced in this paper are of a general nature and can be used for investigation of processing of any other symbol time series by means of SRN.

2007 Special Issue: Learning grammatical structure with Echo State Networks

2007

Echo State Networks (ESNs) have been shown to be effective for a number of tasks, including motor control, dynamic time series prediction, and memorizing musical sequences. However, their performance on natural language tasks has been largely unexplored until now. Simple Recurrent Networks (SRNs) have a long history in language modeling and show a striking similarity in architecture to ESNs. A comparison of SRNs and ESNs on a natural language task is therefore a natural choice for experimentation. Elman applies SRNs to a standard task in statistical NLP: predicting the next word in a corpus, given the previous words. Using a simple context-free grammar and an SRN with backpropagation through time (BPTT), Elman showed that the network was able to learn internal representations that were sensitive to linguistic processes that were useful for the prediction task. Here, using ESNs, we show that training such internal representations is unnecessary to achieve levels of performance comparable to SRNs. We also compare the processing capabilities of ESNs to bigrams and trigrams. Due to some unexpected regularities of Elman's grammar, these statistical techniques are capable of maintaining dependencies over greater distances than might be initially expected. However, we show that the memory of ESNs in this word-prediction task, although noisy, extends significantly beyond that of bigrams and trigrams, enabling ESNs to make good predictions of verb agreement at distances over which these methods operate at chance. Overall, our results indicate a surprising ability of ESNs to learn a grammar, suggesting that they form useful internal representations without learning them.

Learning Grammatical Structure with Echo State Networks

Echo State Networks (ESNs) have been shown to be effective for a number of tasks, including motor control, dynamic time series prediction, and memorizing musical sequences. However, their performance on natural language tasks has been largely unexplored until now. Simple Recurrent Networks (SRNs) have a long history in language modeling and show a striking similarity in architecture to ESNs. A comparison of SRNs and ESNs on a natural language task is therefore a natural choice for experimentation. Elman applies SRNs to a standard task in statistical NLP: predicting the next word in a corpus, given the previous words. Using a simple context-free grammar and an SRN with backpropagation through time (BPTT), Elman showed that the network was able to learn internal representations that were sensitive to linguistic processes that were useful for the prediction task. Here, using ESNs, we show that training such internal representations is unnecessary to achieve levels of performance comparable to SRNs. We also compare the processing capabilities of ESNs to bigrams and trigrams. Due to some unexpected regularities of Elman’s grammar, these statistical techniques are capable of maintaining dependencies over greater distances than might be initially expected. However, we show that the memory of ESNs in this word-prediction task, although noisy, extends signiﬁcantly beyond that of bigrams and trigrams, enabling ESNs to make good predictions of verb agreement at distances over which these methods operate at chance. Overall, our results indicate a surprising ability of ESNs to learn a grammar, suggesting that they form useful internal representations without learning them.

Syntactic systematicity in sentence processing with a recurrent self-organizing network (original) (raw)

Related papers