Theresa Biberauer | University of Cambridge (original) (raw)

Papers by Theresa Biberauer

Research paper thumbnail of Syntactic architecture and its consequences III: Inside syntax

History happens only once. This seems to set up an impenetrable barrier for social sciences, like... more History happens only once. This seems to set up an impenetrable barrier for social sciences, like historical linguistics, that concern themselves with change over time. We have the historical record to go on with no convincing way to generate alternative histories that could be used for hypothesis testing. Nevertheless, it is of some interest to ask whether what we see in the historical record is due to particular forces or whether the time series we see could be the result of random drift. In this paper, I will spell out some simple principles of random drift that can be used to construct null hypotheses against which we can study particular cases of language change. The study of random drift allows us to sharpen our analyses of language change and develop more constrained theories of language variation and change.

Research paper thumbnail of Introduction: ‘n Klein ietsie for Johan Oosthuizen

Stellenbosch papers in linguistics, 2017

Research paper thumbnail of Automatic Language Identification in Code-Switched Hindi-English Social Media Text

Journal of Open Humanities Data

Natural Language Processing (NLP) tools typically struggle to process code-switched data and so l... more Natural Language Processing (NLP) tools typically struggle to process code-switched data and so linguists are commonly forced to annotate such data manually. As this data becomes more readily available, automatic tools are increasingly needed to help speed up the annotation process and improve consistency. Last year, such a toolkit was developed to semi-automatically annotate transcribed bilingual code-switched Vietnamese-English speech data with token-based language information and POS tags (hereafter the CanVEC toolkit, L. Nguyen & Bryant, 2020). In this work, we extend this methodology to another language pair, Hindi-English, to explore the extent to which we can standardise the automation process. Specifically, we applied the principles behind the CanVEC toolkit to data from the International Conference on Natural Language Processing (ICON) 2016 shared task, which consists of social media posts (Facebook, Twitter and WhatsApp) that have been annotated with language and POS tags (Molina et al., 2016). We used the ICON-2016 annotations as the gold-standard labels in the language identification task. Ultimately, our tool achieved an F 1 score of 87.99% on the ICON-2016 data. We then evaluated the first 500 tokens of each social media subset manually, and found almost 40% of all errors were caused entirely by problems with the gold-standard, i.e., our system was correct. It is thus likely that the overall accuracy of our system is higher than reported. This shows great potential for effectively automating the annotation of code-switched corpora, on different language combinations, and in different genres. We finally discuss some limitations of our approach and release our code and human evaluation together with this paper.

Research paper thumbnail of (University of Cambridge) 1. Jäger on types of indefinite DPs

Jäger (2008) presents an analysis of the different types of indefinite DPs and their diachronic d... more Jäger (2008) presents an analysis of the different types of indefinite DPs and their diachronic development, focussing on German but, in Section 3, extending it to a wide range of languages. She distinguishes three types of indefinite: first, “normal ” or positive polarity item/PPI indefinites (e.g. something); second, negative polarity item/NPI indefinites (e.g. NPI anything) and, third, negative indefinites (e.g. nothing). She captures the similarities and differences between the three types with a two-feature system combined with a general notion of underspecification. The two features are [+affective] (in essentially the sense introduced in Klima 1964) and [+negative]. Jäger assumes that [+negative] entails [+affective], with the result that there are only three possible well-formed feature combinations. Adopting underspecification theory, so that negative values are not specified (cf. Wunderlich/Fabri 1995, Blevins 2000, Eisenbeiss 2002), these feature combinations characterise...

Research paper thumbnail of Syntactic architecture and its consequences I: Syntax inside the grammar

You say you want a revolution Well you know We all want to change the world You tell me that it's... more You say you want a revolution Well you know We all want to change the world You tell me that it's evolution Well you know We all want to change the world Don't you know it's gonna be alright-The Beatles, Revolution 1

Research paper thumbnail of Negative exclamatives in Afrikaans: some initial thoughts

Stellenbosch Papers in Linguistics, 2018

Research paper thumbnail of Introduction: Changing views of syntactic change

Lexical, Morphological, and Information-Structural Interactions, 2015

Research paper thumbnail of Syntactic doubling and the encoding of voice in Eastern Abruzzese

Proceedings of the 25th west coast …, 2006

Eastern Abruzzese (EA), a southern Italian dialect spoken in Central Italy, exhibits an auxiliary... more Eastern Abruzzese (EA), a southern Italian dialect spoken in Central Italy, exhibits an auxiliary-selection pattern that commonly surfaces in Central and Southern varieties of Italian: as illustrated in (1), it is person-rather than argument structure-driven:2

Research paper thumbnail of The Final-Over-Final Constraint and Predictions for Diachronic Change

Toronto Working Papers …, 2009

Following Biberauer, Holmberg & Roberts (2007, 2008), we examine the predictions of the Final-ove... more Following Biberauer, Holmberg & Roberts (2007, 2008), we examine the predictions of the Final-over-Final Constraint (FOFC) for grammatical change and borrowing. As an invariant syntactic principle (cf. Chomsky 1981 and following) FOFC rules out the synchronic possibility of a ...

Research paper thumbnail of Chapter 4. Conditional inversion and types of parametric change

Research paper thumbnail of Cascading Parameter Changes

Research paper thumbnail of Die rol van Afrikaans as identiteitsfaktor by SA ekspatriate in die Verenigde Koninkryk

This paper investigates the role played by Afrikaans in the cultural and personal identity of (fo... more This paper investigates the role played by Afrikaans in the cultural and personal identity of (formerly) Afrikaans-speaking South Africans living in the United Kingdom. It has two major objectives: to set up a general profile of the composition of this relatively recently established Afrikaans-speaking expatriate community, and to identify the domains in which Afrikaans is used by its members; and to use this information to consider a more complex issue, namely the precise nature of the interplay between language and identity in the UK diaspora community and whether this can be understood in terms of scenarios previously proposed by Fishman (1994) and Kotze (1994). The findings presented are based on a quantitative, questionnaire-based study which required respondents to supply biographical data and information about their language usage patterns and attitudes, particularly as these relate to Afrikaans. In demographic terms, the Afrikaans diaspora community was found to be relativel...

Research paper thumbnail of The Final-Over-Final Condition

Research paper thumbnail of Into adpositions

Research paper thumbnail of Nie sommer nie': Sociohistorical and formal comparative considerations in the rise and maintenance of the modern Afrikaans negation system

STELLENBOSCH PAPERS IN LINGUISTICS PLUS, 2015

This article has three major objectives. Firstly, it aims to describe and account for the peculia... more This article has three major objectives. Firstly, it aims to describe and account for the peculiarity of the modern Afrikaans negative concord marker nie 2 in the familiar Western European context. I appeal to Roberge's (2000) diachronic proposals as the initial starting point for this oddness, showing how nie 2 's putative origins as a discourse-oriented particle are synchronically reflected in the modern language, producing, among other things, what appears to be inertness in the context of Jespersen's Cycle. This inertness leads to the interface-driven hypothesis that systems in which a structurally very high element becomes grammaticalised as a sentential Negative Concord element will not progress to the next stage of Jespersen's Cycle, i.e. a structurally very high Negative Concord element will never take over as the "real" negation element. The article's second objective is to demonstrate, on the basis of data from Brazilian Portuguese, Santomé, and a subset of Bantu languages, that the predictions of this hypothesis appear to be correct. At the same time, I show how crucial it is to distinguish the cyclic negation-reinforcing developments associated with Jespersen's Cycle from non-cyclic reinforcement developments; as they may draw on the same lexical resources, this can be a challenging task, particularly where less well-studied languages are the object of investigation. The final part of the article broadens the focus, considering Afrikaans's overall negation profile in the context of negation typology and learnability. The conclusion drawn here is that this system, which owes some of its properties to prescriptive stipulations, is a highly unusual and possibly not even naturally acquirable one.

Research paper thumbnail of Linearization and the architecture of grammar: a view from the Final-over-Final Constraint

StiL-Studies in Linguistics (Proceedings of …, 2009

This paper addresses the issue of the locus of linearization information in the context of a mini... more This paper addresses the issue of the locus of linearization information in the context of a minimalist grammar. Contrary to what is arguably the dominant view in minimalist theorizing today, it is argued that linearization information must in fact be specified Narrow Syntaxinternally. The imperative underlying this conclusion is an empirical skewing in the domain of word-order variation, in terms of which head-initial structures associated with a given projection line may only be (harmonically) dominated by head-initial structures, while head-final structures may be dominated either by head-initial or head-final structures in the same context-the so-called Final-over-Final Constraint (FOFC). It is argued that attested FOFC effects suggest that linearization information is in fact encoded in such a waynamely, by harnessing an already-required movement diacritic in accordance with Relativized Minimality, arguably a third-factor-imposed principle-that its NSinternal presence does not violate the Strong Minimalist Thesis of Chomsky (2001 et seq.). We also consider the question syntactic categories and their formal status, against this background. * The research reported here is funded by AHRC Grant AH/E009239/1: "Structure and Linearization in Disharmonic Word Orders". For valuable input on the ideas central to this paper we thank in particular Angel Gallego, John Hawkins, Neil Myler, and Michelle Sheehan, and also the IGG35 and GLOW32 audiences. All usual disclaimers apply.

Research paper thumbnail of Syntactic Doubling and the Encoding of Voice in Eastern Abruzzese1

This work is partially supported by an EU Marie Curie award No. 006833 ("Abruzzese Syntax") under... more This work is partially supported by an EU Marie Curie award No. 006833 ("Abruzzese Syntax") under Framework 6 to D'Alessandro and AHRC project No. AR14458 ("Null Subjects and the Structure of Parametric Variation") to Biberauer. We gratefully thank Adam Ledgeway, Nigel Vincent and Ian Roberts, and also the audiences at WCCFL25 and the Bristol Italian Dialectology Meeting for their comments and suggestions. All the usual disclaimers apply. 2 We will only address the variety of EA spoken on the coast in this paper. The western variety is a central dialect which exhibits completely different features. 3 The data contained in this paper are from the dialect spoken in Arielli (Chieti).

Research paper thumbnail of Evidence That V2 Involves Two Movements: a Reply to Müller

Cambridge Occasional Papers in …, 2004

This paper considers a recent proposal by Müller (forthcoming) that the traditional two-movement ... more This paper considers a recent proposal by Müller (forthcoming) that the traditional two-movement analysis of verb second (V2) constructions can usefully be replaced by a single movement analysis. Specifically, Müller proposes that V2 structures involve a fronted vP-remnant ...

Research paper thumbnail of Subjects, Tense and Verb-Movement In Germanic and Romance

Cambridge Occasional Papers in …, 2008

This paper takes a closer look at the attraction properties of T. It highlights an empirically at... more This paper takes a closer look at the attraction properties of T. It highlights an empirically attested distinction between rich agreement inflection, exhibited by null-subject languages, and rich tense inflection, found in Romance, but not Germanic, and argues for the syntactic relevance of ...

Research paper thumbnail of Factors 2 and 3: Towards a principled approach

This paper seeks to make progress in our understanding of the non-UG components of Chomsky's (200... more This paper seeks to make progress in our understanding of the non-UG components of Chomsky's (2005) Three Factors model. In relation to the input (Factor 2), I argue for the need to formulate a suitably precise hypothesis about which aspects of the input will qualify as 'intake' and, hence, serve as the basis for grammar construction. In relation to Factor 3, I highlight a specific cognitive bias that appears well motivated outside of language, while also having wide-ranging consequences for our understanding of how I-language grammars are constructed, and why they should have the crosslinguistically comparable form that generativists have always argued human languages have. This is Maximise Minimal Means (MMM). I demonstrate how its incorporation into our model of grammar acquisition facilitates understanding of diverse facts about natural language typology, acquisition, both in "stable" and "unstable" contexts, and also the ways in which linguistic systems may change over time.

Research paper thumbnail of Syntactic architecture and its consequences III: Inside syntax

History happens only once. This seems to set up an impenetrable barrier for social sciences, like... more History happens only once. This seems to set up an impenetrable barrier for social sciences, like historical linguistics, that concern themselves with change over time. We have the historical record to go on with no convincing way to generate alternative histories that could be used for hypothesis testing. Nevertheless, it is of some interest to ask whether what we see in the historical record is due to particular forces or whether the time series we see could be the result of random drift. In this paper, I will spell out some simple principles of random drift that can be used to construct null hypotheses against which we can study particular cases of language change. The study of random drift allows us to sharpen our analyses of language change and develop more constrained theories of language variation and change.

Research paper thumbnail of Introduction: ‘n Klein ietsie for Johan Oosthuizen

Stellenbosch papers in linguistics, 2017

Research paper thumbnail of Automatic Language Identification in Code-Switched Hindi-English Social Media Text

Journal of Open Humanities Data

Natural Language Processing (NLP) tools typically struggle to process code-switched data and so l... more Natural Language Processing (NLP) tools typically struggle to process code-switched data and so linguists are commonly forced to annotate such data manually. As this data becomes more readily available, automatic tools are increasingly needed to help speed up the annotation process and improve consistency. Last year, such a toolkit was developed to semi-automatically annotate transcribed bilingual code-switched Vietnamese-English speech data with token-based language information and POS tags (hereafter the CanVEC toolkit, L. Nguyen & Bryant, 2020). In this work, we extend this methodology to another language pair, Hindi-English, to explore the extent to which we can standardise the automation process. Specifically, we applied the principles behind the CanVEC toolkit to data from the International Conference on Natural Language Processing (ICON) 2016 shared task, which consists of social media posts (Facebook, Twitter and WhatsApp) that have been annotated with language and POS tags (Molina et al., 2016). We used the ICON-2016 annotations as the gold-standard labels in the language identification task. Ultimately, our tool achieved an F 1 score of 87.99% on the ICON-2016 data. We then evaluated the first 500 tokens of each social media subset manually, and found almost 40% of all errors were caused entirely by problems with the gold-standard, i.e., our system was correct. It is thus likely that the overall accuracy of our system is higher than reported. This shows great potential for effectively automating the annotation of code-switched corpora, on different language combinations, and in different genres. We finally discuss some limitations of our approach and release our code and human evaluation together with this paper.

Research paper thumbnail of (University of Cambridge) 1. Jäger on types of indefinite DPs

Jäger (2008) presents an analysis of the different types of indefinite DPs and their diachronic d... more Jäger (2008) presents an analysis of the different types of indefinite DPs and their diachronic development, focussing on German but, in Section 3, extending it to a wide range of languages. She distinguishes three types of indefinite: first, “normal ” or positive polarity item/PPI indefinites (e.g. something); second, negative polarity item/NPI indefinites (e.g. NPI anything) and, third, negative indefinites (e.g. nothing). She captures the similarities and differences between the three types with a two-feature system combined with a general notion of underspecification. The two features are [+affective] (in essentially the sense introduced in Klima 1964) and [+negative]. Jäger assumes that [+negative] entails [+affective], with the result that there are only three possible well-formed feature combinations. Adopting underspecification theory, so that negative values are not specified (cf. Wunderlich/Fabri 1995, Blevins 2000, Eisenbeiss 2002), these feature combinations characterise...

Research paper thumbnail of Syntactic architecture and its consequences I: Syntax inside the grammar

You say you want a revolution Well you know We all want to change the world You tell me that it's... more You say you want a revolution Well you know We all want to change the world You tell me that it's evolution Well you know We all want to change the world Don't you know it's gonna be alright-The Beatles, Revolution 1

Research paper thumbnail of Negative exclamatives in Afrikaans: some initial thoughts

Stellenbosch Papers in Linguistics, 2018

Research paper thumbnail of Introduction: Changing views of syntactic change

Lexical, Morphological, and Information-Structural Interactions, 2015

Research paper thumbnail of Syntactic doubling and the encoding of voice in Eastern Abruzzese

Proceedings of the 25th west coast …, 2006

Eastern Abruzzese (EA), a southern Italian dialect spoken in Central Italy, exhibits an auxiliary... more Eastern Abruzzese (EA), a southern Italian dialect spoken in Central Italy, exhibits an auxiliary-selection pattern that commonly surfaces in Central and Southern varieties of Italian: as illustrated in (1), it is person-rather than argument structure-driven:2

Research paper thumbnail of The Final-Over-Final Constraint and Predictions for Diachronic Change

Toronto Working Papers …, 2009

Following Biberauer, Holmberg & Roberts (2007, 2008), we examine the predictions of the Final-ove... more Following Biberauer, Holmberg & Roberts (2007, 2008), we examine the predictions of the Final-over-Final Constraint (FOFC) for grammatical change and borrowing. As an invariant syntactic principle (cf. Chomsky 1981 and following) FOFC rules out the synchronic possibility of a ...

Research paper thumbnail of Chapter 4. Conditional inversion and types of parametric change

Research paper thumbnail of Cascading Parameter Changes

Research paper thumbnail of Die rol van Afrikaans as identiteitsfaktor by SA ekspatriate in die Verenigde Koninkryk

This paper investigates the role played by Afrikaans in the cultural and personal identity of (fo... more This paper investigates the role played by Afrikaans in the cultural and personal identity of (formerly) Afrikaans-speaking South Africans living in the United Kingdom. It has two major objectives: to set up a general profile of the composition of this relatively recently established Afrikaans-speaking expatriate community, and to identify the domains in which Afrikaans is used by its members; and to use this information to consider a more complex issue, namely the precise nature of the interplay between language and identity in the UK diaspora community and whether this can be understood in terms of scenarios previously proposed by Fishman (1994) and Kotze (1994). The findings presented are based on a quantitative, questionnaire-based study which required respondents to supply biographical data and information about their language usage patterns and attitudes, particularly as these relate to Afrikaans. In demographic terms, the Afrikaans diaspora community was found to be relativel...

Research paper thumbnail of The Final-Over-Final Condition

Research paper thumbnail of Into adpositions

Research paper thumbnail of Nie sommer nie': Sociohistorical and formal comparative considerations in the rise and maintenance of the modern Afrikaans negation system

STELLENBOSCH PAPERS IN LINGUISTICS PLUS, 2015

This article has three major objectives. Firstly, it aims to describe and account for the peculia... more This article has three major objectives. Firstly, it aims to describe and account for the peculiarity of the modern Afrikaans negative concord marker nie 2 in the familiar Western European context. I appeal to Roberge's (2000) diachronic proposals as the initial starting point for this oddness, showing how nie 2 's putative origins as a discourse-oriented particle are synchronically reflected in the modern language, producing, among other things, what appears to be inertness in the context of Jespersen's Cycle. This inertness leads to the interface-driven hypothesis that systems in which a structurally very high element becomes grammaticalised as a sentential Negative Concord element will not progress to the next stage of Jespersen's Cycle, i.e. a structurally very high Negative Concord element will never take over as the "real" negation element. The article's second objective is to demonstrate, on the basis of data from Brazilian Portuguese, Santomé, and a subset of Bantu languages, that the predictions of this hypothesis appear to be correct. At the same time, I show how crucial it is to distinguish the cyclic negation-reinforcing developments associated with Jespersen's Cycle from non-cyclic reinforcement developments; as they may draw on the same lexical resources, this can be a challenging task, particularly where less well-studied languages are the object of investigation. The final part of the article broadens the focus, considering Afrikaans's overall negation profile in the context of negation typology and learnability. The conclusion drawn here is that this system, which owes some of its properties to prescriptive stipulations, is a highly unusual and possibly not even naturally acquirable one.

Research paper thumbnail of Linearization and the architecture of grammar: a view from the Final-over-Final Constraint

StiL-Studies in Linguistics (Proceedings of …, 2009

This paper addresses the issue of the locus of linearization information in the context of a mini... more This paper addresses the issue of the locus of linearization information in the context of a minimalist grammar. Contrary to what is arguably the dominant view in minimalist theorizing today, it is argued that linearization information must in fact be specified Narrow Syntaxinternally. The imperative underlying this conclusion is an empirical skewing in the domain of word-order variation, in terms of which head-initial structures associated with a given projection line may only be (harmonically) dominated by head-initial structures, while head-final structures may be dominated either by head-initial or head-final structures in the same context-the so-called Final-over-Final Constraint (FOFC). It is argued that attested FOFC effects suggest that linearization information is in fact encoded in such a waynamely, by harnessing an already-required movement diacritic in accordance with Relativized Minimality, arguably a third-factor-imposed principle-that its NSinternal presence does not violate the Strong Minimalist Thesis of Chomsky (2001 et seq.). We also consider the question syntactic categories and their formal status, against this background. * The research reported here is funded by AHRC Grant AH/E009239/1: "Structure and Linearization in Disharmonic Word Orders". For valuable input on the ideas central to this paper we thank in particular Angel Gallego, John Hawkins, Neil Myler, and Michelle Sheehan, and also the IGG35 and GLOW32 audiences. All usual disclaimers apply.

Research paper thumbnail of Syntactic Doubling and the Encoding of Voice in Eastern Abruzzese1

This work is partially supported by an EU Marie Curie award No. 006833 ("Abruzzese Syntax") under... more This work is partially supported by an EU Marie Curie award No. 006833 ("Abruzzese Syntax") under Framework 6 to D'Alessandro and AHRC project No. AR14458 ("Null Subjects and the Structure of Parametric Variation") to Biberauer. We gratefully thank Adam Ledgeway, Nigel Vincent and Ian Roberts, and also the audiences at WCCFL25 and the Bristol Italian Dialectology Meeting for their comments and suggestions. All the usual disclaimers apply. 2 We will only address the variety of EA spoken on the coast in this paper. The western variety is a central dialect which exhibits completely different features. 3 The data contained in this paper are from the dialect spoken in Arielli (Chieti).

Research paper thumbnail of Evidence That V2 Involves Two Movements: a Reply to Müller

Cambridge Occasional Papers in …, 2004

This paper considers a recent proposal by Müller (forthcoming) that the traditional two-movement ... more This paper considers a recent proposal by Müller (forthcoming) that the traditional two-movement analysis of verb second (V2) constructions can usefully be replaced by a single movement analysis. Specifically, Müller proposes that V2 structures involve a fronted vP-remnant ...

Research paper thumbnail of Subjects, Tense and Verb-Movement In Germanic and Romance

Cambridge Occasional Papers in …, 2008

This paper takes a closer look at the attraction properties of T. It highlights an empirically at... more This paper takes a closer look at the attraction properties of T. It highlights an empirically attested distinction between rich agreement inflection, exhibited by null-subject languages, and rich tense inflection, found in Romance, but not Germanic, and argues for the syntactic relevance of ...

Research paper thumbnail of Factors 2 and 3: Towards a principled approach

This paper seeks to make progress in our understanding of the non-UG components of Chomsky's (200... more This paper seeks to make progress in our understanding of the non-UG components of Chomsky's (2005) Three Factors model. In relation to the input (Factor 2), I argue for the need to formulate a suitably precise hypothesis about which aspects of the input will qualify as 'intake' and, hence, serve as the basis for grammar construction. In relation to Factor 3, I highlight a specific cognitive bias that appears well motivated outside of language, while also having wide-ranging consequences for our understanding of how I-language grammars are constructed, and why they should have the crosslinguistically comparable form that generativists have always argued human languages have. This is Maximise Minimal Means (MMM). I demonstrate how its incorporation into our model of grammar acquisition facilitates understanding of diverse facts about natural language typology, acquisition, both in "stable" and "unstable" contexts, and also the ways in which linguistic systems may change over time.