From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics | Behavioral and Brain Sciences | Cambridge Core (original) (raw)

Abstract:

The article analyzes the neural and functional grounding of language skills as well as their emergence in hominid evolution, hypothesizing stages leading from abilities known to exist in monkeys and apes and presumed to exist in our hominid ancestors right through to modern spoken and signed languages. The starting point is the observation that both premotor area F5 in monkeys and Broca's area in humans contain a “mirror system” active for both execution and observation of manual actions, and that F5 and Broca's area are homologous brain regions. This grounded the mirror system hypothesis of Rizzolatti and Arbib (1998) which offers the mirror system for grasping as a key neural “missing link” between the abilities of our nonhuman ancestors of 20 million years ago and modern human language, with manual gestures rather than a system for vocal communication providing the initial seed for this evolutionary process. The present article, however, goes “beyond the mirror” to offer hypotheses on evolutionary changes within and outside the mirror systems which may have occurred to equip Homo sapiens with a language-ready brain. Crucial to the early stages of this progression is the mirror system for grasping and its extension to permit imitation. Imitation is seen as evolving via a so-called simple system such as that found in chimpanzees (which allows imitation of complex “object-oriented” sequences but only as the result of extensive practice) to a so-called complex system found in humans (which allows rapid imitation even of complex sequences, under appropriate conditions) which supports pantomime. This is hypothesized to have provided the substrate for the development of protosign, a combinatorially open repertoire of manual gestures, which then provides the scaffolding for the emergence of protospeech (which thus owes little to nonhuman vocalizations), with protosign and protospeech then developing in an expanding spiral. It is argued that these stages involve biological evolution of both brain and body. By contrast, it is argued that the progression from protosign and protospeech to languages with full-blown syntax and compositional semantics was a historical phenomenon in the development of Homo sapiens, involving few if any further biological changes.

References

  1. Bickerton (1995) views infant language, pidgins, and the “language” taught to apes as protolanguages in the sense of a form of communication whose users can only string together a small handful of words at a time with little if any syntax. Bickerton hypothesizes that the protolanguage (in my sense) of Homo erectus was a protolanguage in his sense, in which a few words much like those of today's language are uttered a few at a time to convey meaning without the aid of syntax. I do not assume (or agree with) this hypothesis.

  2. Today's signed languages are fully expressive human languages with a rich syntax and semantics, and are not to be confused with the posited systems of protosign communication. By the same taken, protospeech is a primitive form of communication based on vocal gestures but without the richness of modern human spoken languages.

  3. Since we will be concerned in what follows with sign language as well as spoken language, the “speaker” and “hearer” may be using hand and face gestures rather than vocal gestures for communication.

  4. However, I shall offer below the view that early forms of protosign provided a scaffolding for the initial development of protospeech, rather than holding that protosign was “completed” before protospeech was “initiated.”

  5. I would welcome commentaries on “language-like” aspects of communication in nonprimates, but the present article is purely about changes within the primates that led to the human language- ready brain.

  6. It could be objected that monkey calls are not “involuntary communication” because, for example, vervet alarm calls are given usually in the presence of conspecifics who would react to them. However, I would still call this involuntary – this just shows that two conditions, rather than one, are required to trigger the call. This is distinct from the human use of language to conduct a conversation that may have little or no connection to the current situation.

  7. When I speak of a “stage” in phylogeny, I do not have in mind an all-or-none switch in the genotype that yields a discontinuous change in the phenotype, but rather the coalescence of a variety of changes that can be characterized as forming a global pattern that may emerge over the course of tens or even hundreds of millennia.

  8. Let me stress that complex imitation involves both the recognition of an action as a certain combination of actions and the ability to replicate (something like) that combination. Both skills play a role in the human child's acquisition of language; the latter remains important in the adult's language comprehension.

  9. The attainment of complex imitation was seen as a crucial stage of the evolution of language readiness in Arbib (2002), but was not listed there as a condition for language readiness. I now see this as a mistake.

  10. Unfortunately, space does not permit development of an argument for this controversial claim. Commentaries pro or con the hypothesis will be most welcome.

  11. I wonder at times whether properties LR1 through LR7 do indeed support LA1 or whether LA1 should itself be seen as part of the biological equipment of language readiness. I would welcome commentaries in support of either of these alternatives. However, I remain convinced that LR1 through LR7 coupled with LA1 provide all that is needed for a brain to support LA2, LA3, and LA4.

  12. The pairs (LR6: Beyond the here-and-now 1; LA3: Beyond the here-and-now 2) and (LR7: Paedomorphy and sociality; LA4: Learnability) do not appear in Table 1 because the rest of the paper will not add to their brief treatment in section 2.2.

  13. Figure 2 provides only a partial overview of the model. The full model (see Fagg & Arbib 1998 for more details) includes a number of brain regions, offering schematic models for some and detailed neural-network models for others. The model has been implemented on the computer so that simulations can demonstrate how the activities of different populations vary to explain the linkage between visual affordance and manual grasp.

  14. To keep the exposition compact, in what follows I will use without further explanation the abbreviations for the brain regions not yet discussed. The reader wanting to see the abbreviations spelled out, as well as a brief exposition of data related to the hypothesized linkage of schemas to brain structures, is referred to Oztop and Arbib (2002).

  15. Estimates for the timetable for hominid evolution (I use here those given by Gamble 1994, see his Fig. 4.2) are 20 million years ago for the divergence of monkeys from the line that led to humans and apes, and 5 million years ago for the divergence of the hominid line from the line that led to modern apes.

  16. For more on “chimpanzee culture,” see Whiten et al. (2001) and the Chimpanzee Cultures Web site: http://culture.st-and.ac.uk:16080/chimp/, which gives access to an online database that describes the cultural variations in chimpanzee behavior and shows behavior distributions across the sites in Africa where long-term studies of chimpanzees have been conducted in the wild.

  17. Recall the observation (Note 8) that both the recognition of an action as a certain combination of actions and the ability to replicate (something like) that combination play a role in the human child's acquisition of language, while the former remains important in the adult's language comprehension. But note, too, that stage S4 only takes us to complex imitation of praxic actions; Sections 5 and 6 address the transition to an open system of communicative actions.

  18. As ahistorical support for this, note that airplane is signed in American Sign Language (ASL) with tiny repeated movements of a specific handshape, whereas fly is signed by moving the same handshape along an extended trajectory (Supalla & Newport 1978). I say “ahistorical” because such signs are part of a modern human language rather than holdovers from protosign. Nonetheless, they exemplify the mixture of iconicity and convention that, I claim, distinguishes protosign from pantomime.

  19. Of course, relatively few Chinese characters are so pictographic in origin. For a fuller account of the integration of semantic and phonetic elements in Chinese characters (and a comparison with Sumerian logograms) see Chapter 3 of Coulmas 2003.

  20. Of course, those signs that most clearly resemble pantomimes will be easier for the nonsigner to recognize, just as certaincertain Chinese characters are easier for the novice to recognize. Shannon Casey (personal communication) notes that moving the hands in space to represent actions involving people interacting with people, animals, or other objects is found in signed languages in verbs called “spatial verbs” or “verbs of motion and location.” These verbs can be used with handshapes to represent people or objects called “semantic classifiers” and “size and shape specifiers” (Supalla 1986; see p. 196 for a description of these classifiers and p. 211 for figures of them). Hence, to describe giving someone a cup, the ASL signer may either use the standard give handshape (palm up with fingertips and thumb-tip touching) or use an open, curved handshape with the fingertips and thumb-tip apart and the palm to the side (as if holding a cup). Similarly, to describe giving someone a thick book, the signer can use a handshape with the palm facing up, fingertips pointing outward and thumb also pointing outward with about an inch of space between the thumb and fingertips (as if holding a book). In her own research Casey (2003) has found that hearing subjects with no knowledge of a signed language do produce gestures resembling classifiers. Stokoe (2001, pp. 188–91) relates the use of shape classifiers in ASL to the use of shape classifiers in spoken Native American languages.

  21. Added in proof: Hurford notes that this suggestion was made and discarded prior to publication of Hurford (2004).

  22. Such developments and inventions may have occurred very slowly over the course of many (perhaps even thousands) of generations during which expansion of the proto-vocabulary was piecemeal; it may then have been a major turning point in human history when it was realized that symbols could be created ad libitum and this realization was passed on to future generations. See also Note 25.

  23. Where Corballis focuses on the FOXP2 gene, Crow (2002a) links lateralization and human speciation to a key mutation which may have speciated on a change in a homologous region of the X and Y chromosomes.

  24. I use the word “genius” advisedly. I believe that much work on language evolution has been crippled by the inability to imagine that things we take for granted were in no way a priori obvious, or to see that current generalities were by no means easy to discern in the particularities that they embrace. Consider, for example, that Archimedes (c. 287–212 bce) had the essential idea of the integral calculus, but it took almost 2,000 years before Newton (1642–1727) and Leibniz (1646–1716) found notations that could express the generality implicit in his specific examples and hence unleash an explosion of mathematical innovation. I contend that language, like mathematics, has evolved culturally by such fits and starts. Note 23.

  25. Indeed, adjectives are not the “natural category” they may appear to be. As Dixon (1997, pp. 142 et seq.) observes, there are two kinds of adjective classes across human languages: (1) an open class with hundreds of members (as in English); (2) a small closed class. Languages with small adjective classes are found in every continent except Europe. Igbo, from west Africa, has just eight adjectives: large and small; black/dark and white/light; new and old; and good and bad. Concepts that refer to physical properties tend to be placed in the verb class (e.g., “the stone heavies”) and words referring to human propensities tend to be nouns (e.g., “she has cleverness”).

  26. Leaving aside the fact that the monkey probably does not know that Leo's name is “Leo.”

  27. Not all the symbols need be meaningless; some signs of a signed language can be recognized as conventionalized pantomime, and some Chinese characters can be recognized as conventionalized pictures. But we have already noted that relatively few Chinese characters are pictographic in origin. Similarly, many signs have no link to pantomime. As Coulmas (2003) shows us in analyzing writing systems – but the point holds equally well for speech and sign – the mixture of economy of expression and increasing range of expression leads to more and more of a symbol being built up from meaningless components.