Buz, E., Jaeger, T. F., and Tanenhaus, M. K. 2014. Contextual confusability leads to targeted hyperarticulation. In TBA (eds.) Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci14), TBA. Austin, TX: Cognitive Science Society. (original) (raw)

Contextual confusability leads to targeted hyperarticulation

A central question in the field of language production is the extent to which the speech production system is organized for robust communication. One view holds that speakers' decision to produce more or less clear signals or to speak faster or slower is primarily or even exclusively driven by the demands inherent to production planning. The opposing view holds that these demands are balanced against the goal to be understood. We investigate the degree of hyperarticulation in the presence of easily confusable minimal pair neighbors (e.g., saying pill when bill is contextually co-present and thus a plausible alternative). We directly test whether production difficulty alone can explain such hyperarticulation. The results argue against production-centered accounts. We also investigate how specific hyperarticulation is to the segment that contrasts the target against the contextually plausible alternative. Our evidence comes from a novel web-based speech recording paradigm.

Lexical Confusability and Degree of Coarticulation

Proceedings of the annual meeting of the Berkeley Linguistics Society, 2003

Listener-motivated accommodations in speech Speech is inherently communicative-"we speak to be heard in order to be understood" (Jakobson, Fant, & Halle, 1952). Thus, it stands to reason that speech production should be influenced by a speaker's desire to be understood. A speaker must produce a signal from which his listener can recover the intended message. If he is not sufficiently careful, communication will be unsuccessful. On the other hand, as long as a speaker remains intelligible to his listener, there is no reason that he cannot adjust his production to reduce the amount of effort he must expend as speaker. Lindblom (1990) has proposed a model of the interaction among the forces that shape speech, characterizing speech communication as a dynamic balance between speaker-oriented and listener-oriented forces, regulated by the communicative context. In other words, as factors in the communicative situation place extra demands on the listener, decreasing his chances of recovering the message, the speaker must adjust his pronunciation in order to produce clearer speech (referred to in Lindblom's model as "hyper-speech"). However, when conditions are favorable for communication, the speaker is free to conserve articulatory effort, producing reduced speech (or "hypo-speech"). Research has shown that speakers are sensitive to a number of different types of listener difficulties and make corresponding acoustic-phonetic accommodations. One early and easily confirmable observation of a listenermotivated speech accommodation was that people talk more loudly (Lombard 1911) and more slowly (Hanley & Steer 1949) in noisy environments than in quiet ones. Similarly, speech directed toward hearing-impaired listeners is slower and less phonologically reduced than normal conversational speech (Picheny, et al. 1986). Importantly, these ostensibly listener-motivated accommodations (or at least the speech containing these accommodations) have also been shown to have an observable positive effect on intelligibility for listeners. Experiments have verified, for example, that speech produced in noise is more intelligible when presented at a constant speech-to-noise ratio than speech produced in quiet (e.g., Summers, et al. 1988). And Picheny, et al. (1985) present evidence of a substantial improvement in intelligibility for clear speech relative to normal conversational speech for hearing impaired listeners. Factors internal to the structure of an interaction may also motivate speaker accommodations. Words that are less predictable from the conversational context are more intelligible when removed from their context and presented in isolation than are more predictable words (Lieberman 1963), and the first occurrence of a

Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers’ subsequent pronunciations

Journal of Memory and Language, 2016

interlocutors suggests that previous productions were perceptually confusable. To address this question, we use a novel web-based task-oriented paradigm for speech recording, in which participants produce instructions towards a (simulated) partner with naturalistic response times. We manipulate (1) whether a target word with a voiceless plosive (e.g., pill) occurs in the presence of a voiced competitor (bill) or an unrelated word (food) and (2) whether or not the simulated partner occasionally misunderstands the target word. Speakers hyper-articulated the target word when a voiced competitor was present. Moreover, the size of the hyper-articulation effect was nearly doubled when partners occasionally misunderstood the instruction. A novel type of distributional analysis further suggests that hyper-articulation did not change the target of production, but rather reduced the probability of perceptually ambiguous or confusable productions. These results were obtained in the absence of explicit clarification requests, and persisted across words and over trials. Our findings suggest that speakers adapt their pronunciations based on the perceived communicative success of their previous productions in the current environment. We discuss why speakers make adaptive changes to their speech and what mechanisms might underlie speakers’ ability to do so. Keywords: language production; hyper-articulation; communication; interlocutor feedback; perceptual confusability

A restricted interaction account (RIA) of spoken word production: The best of both worlds

Aphasiology, 2002

Theories of spoken word production generally assume that mapping from conceptual representations (e.g., [furry, feline, domestic]) to phonemes (e.g., /k/, /ae/, /t/) involves both a meaning-based process and a sound-based process. A central question in this framework is how these two processes interact with one another. Two theories that occupy extreme positions on the continuum of interactivity are reviewed: a highly discrete position (e.g., Levelt, Roelofs, & Meyer, 1999), in which the two processes occur virtually independently; and a highly interactive position (e.g., Dell et al., 1997) in which the two processes exert considerable mutual influence over one another. Critical examination of the empirical data reveals that neither position can account for the full range of findings. An alternative position, the restricted interaction account (RIA), is described. By combining aspects of both highly discrete and highly interactive accounts, RIA can account for the existing empirical data, as well as for more recent challenges to interactive accounts. Theories of single word production generally assume that two cognitive processes are required for mapping from a conceptual representations (e.g., [furry, feline, domestic]) to the set of phonemes used to communicate that concept (/k/, /ae/, /t/). The first process is meaning-based (semantic) and involves the selection of a particular word to express a nonverbal concept. The second is sound-based (phonological) and involves retrieving the phonemes that correspond to the selected word (Butterworth, 1989; Garrett, 1982). Although there is agreement on this general characterisation of the two processes, there is considerable controversy regarding how the two processes relate to one another. Some researchers claim that they are independent, with semantic processing strictly preceding phonological processing (e.g., Levelt et al., 1999). These are referred to as ''discrete'' or ''componential'' theories of spoken word production. In contrast, ''interactive'' theories propose that semantic and phonological processes overlap in time and can influence one another (e.g., Dell et al., 1997). A good deal of the literature in spoken production has focused on the contrast between these two types of architectures. But because each of these theories has been able to account for a number, but not all, of the relevant empirical findings, this debate has yet to be resolved (e.g.,

Distant context effects in language production: A reply to Motley et al

Journal of Psycholinguistic Research, 1983

Motley, Baars, and Camden (1981), continuing earlier work, report on several experiments that are interpreted as evidence for output editing in language production, editing that is sensitive to lexical, phonological, semantic, and syntactic factors. Their results can be accounted for equally well if context affects the formation, rather than the output, of the subjects' responses. Similar distant context effects on utterance formation have also been observed in naturally occurring speech errors.

The complexity-cost factor. Commentary on Pickering & Garrod’s theory of language perception and production.

Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal.

Constraint, Word Frequency, and the Relationship between Lexical Processing Levels in Spoken Word Production

Journal of Memory and Language, 1998

Producing a word to express a meaning requires the processes of lexical selection and phonological encoding. We argue that lexical selection is influenced by contextual constraint and phonological encoding by word frequency, and we use these variables to assess the processing relations between selection and encoding. In two experiments we examined latencies to name pictures presented within sentences. The sentences varied in degree of constraint, whereas the target picture-names varied in frequency. In both experiments, targets that followed constraining sentences showed substantially reduced frequency effects. When the targets followed incongruent sentence frames, the frequency effect returned. The results offer new support for the predictions of cascade theories of word production. ᭧ 1998 Academic Press to fulfill the requirements for a Masters degree from the quency and constraint.

Articulatory Planning Is Continuous and Sensitive to Informational Redundancy

Phonetica, 2005

Pluymaekers, M., M. Ernestus, and R. H. Baayen This study investigates the relationship between word repetition, predictability from neighbouring words, and articulatory reduction in Dutch. For the seven most frequent words ending in the adjectival suffix -lijk, 40 occurrences were randomly selected from a large database of face-to-face conversations. Analysis of the selected tokens showed that the degree of articulatory reduction (as measured by duration and number of realized segments) was affected by repetition, predictability from the previous word and predictability from the following word. Interestingly, not all of these effects were significant across morphemes and tar-get words. Repetition effects were limited to suffixes, while effects of predictability from the previous word were restricted to the stems of two of the seven target words. Predictability from the following word affected the stems of all target words equally, but not all suffixes. The implications of these findings for models of speech production are discussed.

Frank, A. and Jaeger, T. F. 2008. Speaking Rationally: Uniform Information Density as an Optimal Strategy for Language Production. The 30th Annual Meeting of the Cognitive Science Society (CogSci08), 939-944.

We provide evidence for a rational account of language production, Uniform Information Density (UID, Jaeger, 2006; Levy & Jaeger, 2007). Under the assumption that communication can usefully be understood as information transmission over a capacity-limited noisy channel, an optimal strategy in language production is to maintain a uniform rate of information transmission close to the channel capacity. This theory predicts that speakers will make strategic use of the flexibility allowed by their languages. Speakers should plan their utterances so that elements with high information are lengthened, and elements with low information are shortened, making the amount of information transmitted per time more uniform (and hence closer to the optimum). In three corpus studies, we show that American English speakers’ use of contractions (“you are”! “you’re”) follows the predictions of UID. We then explore further implications of UID for production planning. Keywords: language production; utterance planning; information theory; morphological reduction; contractions