Connectionist Models of Speech Processing
Psycholinguistics refers to the empirical study of the human language processing system, typically using behavioral experiments. This chapter considers attempts to capture psycholinguistic data using connectionist models . We primarily focus on relatively 'early' aspects of speech processing--speech segmentation and word recognition.
Human language processing: Connectionist models
Connectionist psycholinguistics is an emerging approach to modeling empirical data on human language processing using connectionist computational architectures. Over the last 20 years, a wide range of psycholinguistic phenomena have been modeled, such as speech processing, impaired and normal reading, aphasic word production, and structural priming in sentence production.
On the proper treatment of connectionism
A set of hypotheses is formulated for a connectionist approach to cognitive modeling. These hypotheses are shown to be incompatible with the hypotheses underlying traditional cognitive models. The connectionist models considered are massively parallel numerical computational systems that are a kind of continuous dynamical system. The numerical variables in the system correspond semantically to fine-grained features below the level of the concepts consciously used to describe the task domain. The level of analysis is intermediate between those of symbolic cognitive models and neural models. The explanations of behavior provided are like those traditional in the physical sciences, unlike the explanations provided by symbolic models.Higher-level analyses of these connectionist models reveal subtle relations to symbolic models. Parallel connectionist memory and linguistic processes are hypothesized to give rise to processes that are describable at a higher level as sequential rule appli...
Neural network models of language acquisition and processing
Artificial neural network models (also known as Parallel Distributed Processing or Connectionist models) have been highly influential in cognitive science since the mid-1980s. The original inspiration for these systems comes from information processing in the brain, which emerges from a large number of (nearly) identical, simple processing units (neurons) that are interconnected into a network. Each unit receives activation from other units or by stimulation from the external world, and generates an output activation that is a function of the total input activation received. The unit then feeds the output activation onward to the units to which it is connected. Information processing is thus implemented in terms of activation flowing through this network. Each connection between two units has a weight that determines how strongly the first unit affects the second. These weights can be adapted, which constitutes learning, or "training" as it is commonly known in the neural network literature. Algorithms for network training can be roughly divided into supervised and unsupervised methods. Supervised training is applied when a specific and known input-to-output mapping is required (e.g., learning to transform orthographic to phonological representations). To accomplish this, the network is provided with a representative set of "training examples" of inputs and the corresponding target outputs. It then processes each example and the difference between the networks' actual output and the target output leads to an update of the connection weights such that, next time, the output error will be smaller. By far the best known and most used method for supervised training is the Backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986) that makes the network's output activations for the training examples gradually converge toward the target outputs. Unsupervised training, in contrast, makes the network adapt to (aspects of) the statistical structure of input examples without mapping to target outputs (e.g., discovery of regularities in the phonological structure of language). These networks are well-suited to uncovering statistical structure present in the environment without requiring the modeller being aware what the structure is. One well-known example of an unsupervised training method is the learning rule proposed by Hebb (1949): Strengthen connections between units that are simultaneously active and weaken the connections between two units if only one is active. In spite of the superficial similarities between artificial and biological neural networks (i.e., interconnectivity and stimulation passing between neurons to determine their activation, and learning by adaptation of connection strengths), these cognitive models are not usually claimed to simulate processing at the level of biological neurons. Rather, neural network models form a description at Marr's (1982) algorithmic level, that is, they specify cognitive representations and operations while ignoring the biological implementation. Neural networks underwent a surge of popularity in the 1990s, but from the early 21st century they were somewhat overshadowed by symbolic probabilistic models. However, neural networks have enjoyed a recent revival partly due to the success of "deep learning" models, which display state-of-the-art performance on a wide range of artificial intelligence tasks (LeCun, Bengio, & Hinton, 2015). For the most part, the field of cognitive modelling is still to catch up with these novel developments. Consequently, the currently most influential connectionist cognitive models are of the more traditional variety. We return to this issue in the Conclusion. 1.1. Feedforward and recurrent networks Connectionist models are not amorphous networks in which everything is connected to everything else. Rather, a particular structure is imposed, for example by grouping units into a number of layers and allowing activation to flow only from each layer to the next. The first layer receives inputs from the environment, the final layer produces the corresponding output, and any intermediate layer is known as "hidden". Although this so-called "feedforward" architecture can (at least in theory) approximate any computable input-to-output function, it is unable to handle input that comes in over time. This is because the network has no working memory: Each input is immediately overwritten by the next. Hence, the feedforward network is not the most appropriate model for simulating language processing, which is a fundamentally temporal phenomenon. Elman (1990), in his seminal paper "Discovering structure in time", proposed a solution: Include a set of recurrent connections with trainable weights that link each unit of the single hidden layer to all hidden-layer units. Consequently, the hidden layer receives both the current environmental input and its own previous activation state which, in turn, depends on the state before that, etc. In this manner, the model is equipped with a working memory and can therefore encode sequential information, or "structure in time", making it well-suited to processing language as it unfolds over time. This particular architecture became known as the Simple Recurrent Network (SRN) but forms part of a larger class of Recurrent Neural Networks (RNNs) that have connections through which (part of) the network's current activation feeds back to the network itself. 1.2. Neural network models and linguistic theory Connectionist models of language acquisition and processing offer a view of the human language system that is very different from traditional, symbolic models in cognition. For one, neural networks do not distinguish competence (i.e., language knowledge) from performance (i.e., language behaviour). Instead, knowledge becomes instantiated in network connection weights in order for the network to display particular performance. In a sense, it forms procedural rather than declarative knowledge: it is know-how, not know-that. Hence, there is no way for the network to assess its own knowledge. As Clark and Karmiloff-Smith (1993, p. 495) put it: "it is knowledge in the system, but it is not yet knowledge to the system." Second, language researchers from the nativist tradition have famously argued that infants must possess innate, language-specific knowledge or learning mechanisms, because otherwise language acquisition in the absence of negative evidence would be impossible (e.g., Chomsky, 1965; Gold, 1967; Pinker, 1989; among many others). In contrast, empiricists claim that language acquisition requires only domain-general mechanisms. Connectionism falls squarely into the empiricist camp because the representations and learning mechanisms built into neural networks are not specific to language and the networks receive no negative evidence during training. Hence, successful neural network learning of (relevant aspects of) syntax would undermine the nativist position. A third major difference with traditional linguistic thinking is that neural networks do not represent discrete categories (be it phonemes, words, parts-of-speech, or any other category) unless these are explicitly assigned to the network's units a priori. However, in most (and, arguably, the most insightful) models, representations are learned in the hidden layer(s) rather than assigned,
Biological and Cognitive Plausibility in Connectionist Networks for Language Modeling
If we want to explain cognitive processes with means of connectionist networks, these networks have to correspond with cognitive systems and their underlying biological mechanisms in different respects. The question of biological and cognitive plausibility of connectionist models arises from two different aspects-first, from the aspect of biology-on one hand, one has to have a fair understanding of biological mechanisms and cognitive mechanisms in order to represent them in a model, and on the other hand there is the aspect of modeling-one has to know how to construct a model to represent precisely what we are aiming at. Computer power and modeling techniques have improved dramatically in recent 20 years, so the plausibility problem is being addressed in more adequate ways as well. Connectionist models are often used for representing different aspects of natural language. Their biological plausibility had sometimes been questioned in the past. Today, the field of computational neuroscience offers several acceptable possibilities of modeling higher cognitive functions, and language is among them. This paper brings a presentation of some existing connectionist networks modeling natural language. The question of their explanatory power and plausibility in terms of biological and cognitive systems they are representing is discussed.
The Localist and the Distributed Models of Connectionism (2014). Montazeri & Hamidi
Connectionism is the theory that sees brain in terms of neural or parallel distributed processing networks of interconnected units. The present paper reviewed the basic assumptions of connectionism and two main types of connectionist models were explained; the localist model and the distributed model. The drawbacks of the localist connectionism were mentioned. Properties of distributed connectionist networks were delineated. In the end, general problems with connectionist models were discussed. It was mentioned that the major drawback of connectionism that would cast doubt on the usefulness of a connectionist approach was that this approach had its basis on the sciences of math and physics, while the brains of human beings, or language learners, are biological entities. This seems to mar the usefulness of this approach to language learning, since it can be hardly assumed that the mathematical principles can be extended to biological ones. Language learners, language teachers as well as neurologists and psychologists may find the discussions of the present study useful in the process of language acquisition.
Analysis of Distributed Representation of Constituent Structure in Connectionist Systems
A general method, the tensor product representation, is described for the distributed representation of value/variable bindings. The method allows the fully distributed representation of symbolic structures: the roles in the structures, as well as the fillers for those roles, can be arbitrarily non-local. Fully and partially localized special cases reduce to existing cases of connectionist representations of structured data; the tensor product representation generalizes these and the few existing examples of fuUy distributed representations of structures. The representation saturates gracefully as larger structures are represented; it penn its recursive construction of complex representations from simpler ones; it respects the independence of the capacities to generate and maintain multiple bindings in parallel; it extends naturally to continuous structures and continuous representational patterns; it pennits values to also serve as variables; it enables analysis of the interference of symbolic structures stored in associative memories; and it leads to characterization of optimal distributed representations of roles and a recirculation algorithm for learning them.