Jaeger, T. F. (2006) Redundancy and syntactic reduction. Ph.D. thesis, Stanford University. (original) (raw)
Comprehenders are sensitive to probabilistic distributions of linguistic events (Garnsey et al., 1997; Kamide et al., 2003; Konieczny, 2000; Staub and Clifton, 2006, inter alia). Expected words and structures are processed faster than unexpected ones. This raises the question whether syntactic production, too, is sensitive to probabilities of upcoming material. This thesis investigates cases of syntactic reduction, as in “I think (that) the commercial break is over”, where the word that can be omitted. I present evidence from corpus studies of spontaneous speech that syntactic reduction is more likely if the reduced constituent is predictable. Modern statistical regression models are used to guard against common challenges to corpus-based studies (such as clusters and multicollinearity). Taken together with evidence from phonetic reduction (in duration, formant quality, etc.; see, e.g., Aylett and Turk, 2004; Bell et al., 2003; van Son and Pols, 2003), phoneme omission (e.g. t/d deletion; see Bell et al., 2003; Gahl and Garnsey, 2004), and argument drop (Resnik, 1996), the evidence from syntactic reduction supports the Probabilistic Reduction Hypothesis (PRH): “Where grammar allows it, form reduction is more likely, the more redundant the information conveyed by the omitted form details is”. The PRH is compatible with production pressure (Ferreira and Dell, 2000, inter alia) and audience design accounts of reduction (Hawkins, 2004, inter alia). The results are also compatible with ‘uniform information density’ accounts (Aylett, 1999; Aylett and Turk, 2004; Genzel and Charniak, 2002): speakers may insert or omit that to avoid peaks or troughs in information density. Uniform information density accounts integrate both production and audience design accounts, as a uniform amount of information per time/unit optimizes successful information transfer while minimizing production effort. The same probabilities that I show to influence production are known to influence comprehension (Garnsey et al., 1997). Hence, the results may be taken to argue that language users’ representations of constituents contain probabilistic information. In short, knowledge of grammar may imply knowledge of probabilities (see Gahl and Garnsey, 2004). The final part of the thesis presents a new approach to studying what information speakers use to keep track of probabilistic distributions. In a first step, several predictability estimates of the same event are derived using different sets of cues or slightly different assumptions about what the relevant structure is that speakers keep track of. In a second step, these different predictability estimates are compared with regard to how much variation in syntactic reduction they account for. The results suggest that both structural and surface cues are used to keep track of the probability of structures and that speakers keep track of rather specific structures.