Volker Gast | Friedrich-Schiller-Universität Jena (original) (raw)
Papers by Volker Gast
Entropy
Research in computational textual aesthetics has shown that there are textual correlates of prefe... more Research in computational textual aesthetics has shown that there are textual correlates of preference in prose texts. The present study investigates whether textual correlates of preference vary across different time periods (contemporary texts versus texts from the 19th and early 20th centuries). Preference is operationalized in different ways for the two periods, in terms of canonization for the earlier texts, and through sales figures for the contemporary texts. As potential textual correlates of preference, we measure degrees of (un)predictability in the distributions of two types of low-level observables, parts of speech and sentence length. Specifically, we calculate two entropy measures, Shannon Entropy as a global measure of unpredictability, and Approximate Entropy as a local measure of surprise (unpredictability in a specific context). Preferred texts from both periods (contemporary bestsellers and canonical earlier texts) are characterized by higher degrees of unpredicta...
Linguistik Online
Scalar focus operators like even, only, etc. interact with scales, i. e., ordered sets of alterna... more Scalar focus operators like even, only, etc. interact with scales, i. e., ordered sets of alternatives that are referenced by focus structure. The scaling dimensions interacting with focus operators have been argued to be semantic (e. g. entailment relations, probability) in earlier work, but it has been shown that purely semantic analyses are too restrictive, and that the specific scale that a given operator interacts with is often pragmatic, in the sense of being a function of the context. If that is true, the question arises what exactly determines the (types of) scales interacting with focus operators. The present study addresses this question by investigating the distributional behaviour of the additive scalar particle even relative to scales whose focus alternatives are ordered in terms of evaluative attitudes (positive, negative). Our hypothesis is that such evaluative attitudinal scales are at least partially functions of the lexical material in the sentential environment. T...
This article contains some thoughts on the role of bilingual cognition in the diachronic change o... more This article contains some thoughts on the role of bilingual cognition in the diachronic change of morphological paradigms, with a focus on contact-induced change. In a first step, a general typology of paradigm change is proposed, based on a distinction between three levels of linguistic organization (the sign/Level 1, the category/Level 2, and the dimension/Level 3), and two types of change (neutralization and differentiation), thus distinguishing six types of paradigm change. Examples of these types (taken from the pertinent literature) are discussed, and two questions are addressed in each case: (i) To what extent does contact-induced paradigm change of a specific type differ from internal change? (ii) What are (potentially) the underlying cognitive processes motivating each type of change? The hypothesis is explored that there is a correlation between the three levels of analysis and three types of cognitive processes involved in paradigm change. It is suggested that change at ...
Entropy, 2022
Computational textual aesthetics aims at studying observable differences between aesthetic catego... more Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of loc...
We propose a way of enriching the TimeML annotations of TimeBank by adding information about the ... more We propose a way of enriching the TimeML annotations of TimeBank by adding information about the Topic Time in terms of Klein (1994). The annotations are partly automatic, partly inferential and partly manual. The corpus was converted into the native format of the annotation software GraphAnno and POS-tagged using the Stanford bidirectional dependency network tagger. On top of each finite verb, a FIN-node with tense information was created, and on top of any FIN-node, a TOPICTIME-node, in accordance with Klein's (1994) treatment of finiteness as the linguistic correlate of the Topic Time. Each TOPICTIME-node is linked to a MAKEINSTANCE-node representing an (instantiated) event in TimeML (Pustejovsky et al. 2005), the markup language used for the annotation of TimeBank. For such links we introduce a new category, ELINK. ELINKs capture the relationship between the Topic Time (TT) and the Time of Situation (TSit) and have an aspectual interpretation in Klein's (1994) theory. In...
On reciprocal and reflexive uses of anaphors in German and other European languages
Trends in Linguistics. Studies and Monographs [TiLSM]
Meaning and Grammar of Nouns and Verbs, 2021
IntroductionI t is a well-known fact that the vocabularies of individual languages are structured... more IntroductionI t is a well-known fact that the vocabularies of individual languages are structured very di erently. Even if it is always possible to translate a certain utterance from one language into another, it is rarely, if ever, possible to say that all or even some lexemes making up an utterance in one language correspond perfectly and completely to the lexemes rendering that utterance in another. In most cases the content cut out from the amorphous mass of notions and ideas by one lexeme A may be similar to the content identi ed by some translational counterpart in another, but there is hardly ever complete identity and what we nd is partial overlap at best. The consequence of this basic observation for structuralists was that semantic analysis in one language amounts to describing the structural relations between the lexemes of a language in terms of oppositions (antonymy, complementarity, converseness, etc.), super-and subordination, meronymy, etc. (cf. Lyons 1972, Cruse 1986, Löbner 2002, etc.), and that comparative semantics or comparative lexicology is nothing more than a comparison between these networks of structural relations. More recent theorizing about semantics, speci cally the idea of semantic decomposition in terms of hierarchical structures ("decompositional event seman-In the publications of Sebastian Löbner, to whom we dedicate this article on the occasion of his 65th birthday, comparative studies on lexicology and meaning have played a considerable role (see for instance Löbner 2002: 153,. or Löbner 2011). We would like to thank two anonymous reviewers for their critical comments and valuable suggestions.
The use of modal particles in German has been extensively studied since the late 1960s (e.g. Weyd... more The use of modal particles in German has been extensively studied since the late 1960s (e.g. Weydt 1969, 1977; Thurmair 1989; Helbig & Helbig 1993; cf. also the bibliography by Weydt & Ehlers 1987), and a large number of detailed studies dealing with specific particles are available (e.g. Burckhardt 1982; Doherty 1982; Borst 1985; Hentschel 1986; Lindner 1991; Meibauer 1993; Ormelius-Sandblom 1996; Rinas 2006 on ja and doch). It is not the objective of this paper to contribute to the pool of descriptive generalizations concerning these elements. Rather, the aim of the paper is to propose a model of utterance interpretation which allows us to regard the function of modal particles as an integral part of the interpretation process. Utterances are analyzed against the background of their ability to update discourse contexts (e.g. Stalnaker 1978; Heim 1982, 1983, 1992; Groenendijk & Stokhof 1991; Chierchia 1995), and modal particles are shown to interact with the process of context upda...
GraphAnno is a configurable tool for multi-level annotation which caters for the entire workflow ... more GraphAnno is a configurable tool for multi-level annotation which caters for the entire workflow from corpus import to data export and thus provides a suitable environment for the manual annotation of modals in their sentential contexts. Given its generic data model, it is particularly suitable for enriching existing corpora, e.g. by adding semantic annotations to syntactic ones. In this contribution, we present the functionalities of GraphAnno and make a concrete proposal for the treatment of modals in a corpus, with a focus on scope interactions. We have nothing to say about the specific categories to be annotated. Its generic design allows GraphAnno to be used with various annotation schemes, like those proposed by Hendrickx et al. (2012), Nissim et al. (2013) and Rubinstein et al. (2013). We will use generic category labels from theoretical linguistics for illustration purposes. After providing some background information on the tool in Section 2 we show how GraphAnno deals with...
Human Impersonal Pronoun Uses in English, Dutch and German
Leuvense Bijdragen - Leuven Contributions in Linguistics and Philology, 2012
The pronoun man derives from the homophonous noun meaning ‘man’. English had such a pronoun, but ... more The pronoun man derives from the homophonous noun meaning ‘man’. English had such a pronoun, but it disappeared in the 15 th century (Rissanen 1997: 517–521), so Modern English does not have a ‘man’ strategy for impersonal reference. Conversely, German, at least in the written register, very rarely uses a ‘you’ strategy of the type illustrated in (1). Dutch would seem to have both a ‘man’ and a ‘you’ strategy: 1
In this paper, we propose an annotation scheme for the manual annotation of tense and aspect in n... more In this paper, we propose an annotation scheme for the manual annotation of tense and aspect in natural language corpora, as well as an implementation using GraphAnno, a configurable tool for manual multilevel annotation. The annotation scheme is based on Klein’s (1994) theory of tense and aspect, arguably the most widely accepted theory in this domain (cf. also Klein and Li 2009). One of the most important features of Klein’s theory is that in addition to the time span during which a situation obtains (the ‘time of situation’/TSit), it makes use of the concept of ‘Topic Time’ (TT), which is related to, but different from, Reichenbach’s (1947) reference point ‘R’ (cf. Derczynski and Gaizauskas 2013). Given that the resulting annotations cannot be mapped one-to-one to words or constituents, and as they are partially retrieved from the context, a semantic layer of annotation is needed, in addition to the structural one. The multi-level approach advocated here also allows us to annotat...
ArXiv, 2020
This study investigates global properties of literary and non-literary texts. Within the literary... more This study investigates global properties of literary and non-literary texts. Within the literary texts, a distinction is made between canonical and non-canonical works. The central hypothesis of the study is that the three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers. To investigate these differences, we compiled a corpus containing texts of the three categories of interest, the Jena Textual Aesthetics Corpus. Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity in chunks of text, and (iv) the distribution of topic probabilities in chunks of texts. These basic observations are grouped into two more general cate...
Aspects of Linguistic Variation, 2018
Our study aims to explore how much information about areal patterns of colexification we can gain... more Our study aims to explore how much information about areal patterns of colexification we can gain from lexical databases such as CLICS and ASJP. We adopt a bottom-up (rather than hypothesis-driven) approach, identifying areal patterns in three steps: (i) determine spatial autocorrelations in the data, (ii) identify clusters as candidates for convergence areas and (iii) test the clusters resulting from the second step controlling for genealogical relatedness. Moreover, we identify a (genealogical) diversity index for each cluster. This approach yields promising results, which we regard as a proof of concept, but we also point out some drawbacks of the use of major lexical databases.
Scientific Data, 2020
Advances in computer-assisted linguistic research have been greatly influential in reshaping ling... more Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing stude...
Language Sciences, 2019
The history of either as a clause-final, right-periphery marker has seen little intensive researc... more The history of either as a clause-final, right-periphery marker has seen little intensive research, apart from a few isolated studies such as Rullmann (2002) and Gast (2013). This is surprising, given the recent interest in parenthetical discourse items and the controversies surrounding their development (grammaticalisation vs. pragmaticalization, and other debates). In the present study, it is first questioned whether right-periphery either (RP-either) could be categorized as a bona fide example of a discourse marker, and second, how a hypothesis emerged that 18 th and 19 th century prescriptivism motivated its sudden shift to become a postnegation, clause-final item, replacing the now non-standard, right-periphery neither (e.g. Jespersen 1917, Fitzmaurice and Smith 2012). The present study builds on the previous accounts, suggesting that the use of either as a clause-final additive focus marker had grammaticalised from a resumptive quantifier, post-posed in apposition and gradually renovating the former functions of clause-final neither in strong negative polarity contexts by a process of grammaticalization following co-optation (Heine 2013). The social stigmatization of right-periphery neither (RP neither) as an example of negative concord at the time must therefore have been due simply to its resulting association with recessive, dialectal or nonstandard usage, as RP-either rapidly increased its earlier range of functions to take over those of the ousted RP-neither in strong negative polarity contexts during the 19th century.
Beyond 'Any' and 'Ever', 2013
The English dual quantifier either has an intricate history. While it is commonly regarded as an ... more The English dual quantifier either has an intricate history. While it is commonly regarded as an existential quantifier with a distributional restriction to nonveridical (or 'non-affirmative', 'downward entailing', etc.) contexts in the modern language, its Old English precursor aeghwaeDer (contracted, aegDer) was a dual distributive universal quantifier, i.e. a quantifier meaning 'each of two'. This study investigates the processes of change leading from the universal quantifier of Old English to the nonveridical existential quantifier of Modern English. It is argued that this process was set in motion by the decline of another dual quantifier, OE awDer/ME outher 'one or other of two'. This quantifier was first replaced by either in combination with clause-internal nonveridical operators, where a wide-scope universal quantifier was equivalent to a narrow-scope existential quantifier (e.g. in interaction with a modal operator). Gradually 'absorbing' outher, either then extended its distribution further and came to be used in nonveridical contexts with a clause-external nonveridical operator as well (e.g. in conditional clauses). In such contexts either, which was still interpreted as a universal quantifier in veridical contexts, could only have a universal reading when interpreted with extra-clausal, i.e. exceptional, scope. Such exceptional scope behaviour, in conjunction with the rise of a competing universal quantifier in veridical contexts (both[e]), led to the reanalysis of either as a nonveridical existential quantifier, which thus acquired the distribution that it has in present-day English. The paper is intended as a case study on the interaction of lexical content, scope and polarity properties in the genesis of a polarity-sensitive operator, as well as the role of competition between (near) equivalent expressions in diachronic change.
Scalar additive operators: Typology and historical development
Human impersonal pronouns in populist discourse
DESCRIPTION Invited talk at workshop "Interdisciplinary perspectives on populist discourse i... more DESCRIPTION Invited talk at workshop "Interdisciplinary perspectives on populist discourse in Germany and Poland", August 20-21, 2015
Towards a distributional typology of human impersonal pronouns, based on data from European languages
Languages Across Boundaries, 2013
Human impersonal pronouns like French on and German man are regarded as pronouns that are used to... more Human impersonal pronouns like French on and German man are regarded as pronouns that are used to fill an argument position with a variable ranging over human referents without establishing a referential link to an entity from the universe of discourse. Such pronouns are highly context-dependent and variable in their distributional and semantic properties. Following up on work done by Anna Siewierska, we aim to capture this variability by using the semantic map methodology. We propose a mathematical (graph-theoretic) definition of ‘connectvity maps’ in general and devise a map for human impersonal pronouns or, more generally speaking, the ‘impersonalization’ of argument positions. The map is intended as a hypothesis about possible patterns of polysemy in the domain of investigation, and is tested on the basis of a small sample of European languages.
Journal of Pragmatics, 2015
Pronominal and verbal forms of the second person singular are canonically used with personal refe... more Pronominal and verbal forms of the second person singular are canonically used with personal reference, i.e., as referring (exclusively) to the addressee. In what is often called 'impersonal' uses, the range of reference is broadened from the addressee to a more comprehensive set of referents, and sometimes the relevant sentences are not literally speaking true, as properties are attributed to the addressee which (s)he does not actually have. The question arises whether impersonally used forms of the second person singular constitute a grammatical category of their own, or whether they exhibit the same (underlying) semantics as canonical uses of the second person. On the basis of a dynamic-inferential view of communication, we argue for a unified analysis of personal and impersonal second person forms. Effects of generalization are claimed to emerge in sentences which are generalizing independently of the occurrence of a second person form. Uses of the second person that lead to truth-conditionally false sentences are claimed to involve (an invitation to) simulation and the creation of empathy. According to this analysis, impersonal uses of the second person establish a direct referential link to the addressee, just like personal uses, and their status as 'impersonal' is a function of sentential contexts and conversational conditions.
Entropy
Research in computational textual aesthetics has shown that there are textual correlates of prefe... more Research in computational textual aesthetics has shown that there are textual correlates of preference in prose texts. The present study investigates whether textual correlates of preference vary across different time periods (contemporary texts versus texts from the 19th and early 20th centuries). Preference is operationalized in different ways for the two periods, in terms of canonization for the earlier texts, and through sales figures for the contemporary texts. As potential textual correlates of preference, we measure degrees of (un)predictability in the distributions of two types of low-level observables, parts of speech and sentence length. Specifically, we calculate two entropy measures, Shannon Entropy as a global measure of unpredictability, and Approximate Entropy as a local measure of surprise (unpredictability in a specific context). Preferred texts from both periods (contemporary bestsellers and canonical earlier texts) are characterized by higher degrees of unpredicta...
Linguistik Online
Scalar focus operators like even, only, etc. interact with scales, i. e., ordered sets of alterna... more Scalar focus operators like even, only, etc. interact with scales, i. e., ordered sets of alternatives that are referenced by focus structure. The scaling dimensions interacting with focus operators have been argued to be semantic (e. g. entailment relations, probability) in earlier work, but it has been shown that purely semantic analyses are too restrictive, and that the specific scale that a given operator interacts with is often pragmatic, in the sense of being a function of the context. If that is true, the question arises what exactly determines the (types of) scales interacting with focus operators. The present study addresses this question by investigating the distributional behaviour of the additive scalar particle even relative to scales whose focus alternatives are ordered in terms of evaluative attitudes (positive, negative). Our hypothesis is that such evaluative attitudinal scales are at least partially functions of the lexical material in the sentential environment. T...
This article contains some thoughts on the role of bilingual cognition in the diachronic change o... more This article contains some thoughts on the role of bilingual cognition in the diachronic change of morphological paradigms, with a focus on contact-induced change. In a first step, a general typology of paradigm change is proposed, based on a distinction between three levels of linguistic organization (the sign/Level 1, the category/Level 2, and the dimension/Level 3), and two types of change (neutralization and differentiation), thus distinguishing six types of paradigm change. Examples of these types (taken from the pertinent literature) are discussed, and two questions are addressed in each case: (i) To what extent does contact-induced paradigm change of a specific type differ from internal change? (ii) What are (potentially) the underlying cognitive processes motivating each type of change? The hypothesis is explored that there is a correlation between the three levels of analysis and three types of cognitive processes involved in paradigm change. It is suggested that change at ...
Entropy, 2022
Computational textual aesthetics aims at studying observable differences between aesthetic catego... more Computational textual aesthetics aims at studying observable differences between aesthetic categories of text. We use Approximate Entropy to measure the (un)predictability in two aesthetic text categories, i.e., canonical fiction (‘classics’) and non-canonical fiction (with lower prestige). Approximate Entropy is determined for series derived from sentence-length values and the distribution of part-of-speech-tags in windows of texts. For comparison, we also include a sample of non-fictional texts. Moreover, we use Shannon Entropy to estimate degrees of (un)predictability due to frequency distributions in the entire text. Our results show that the Approximate Entropy values can better differentiate canonical from non-canonical texts compared with Shannon Entropy, which is not true for the classification of fictional vs. expository prose. Canonical and non-canonical texts thus differ in sequential structure, while inter-genre differences are a matter of the overall distribution of loc...
We propose a way of enriching the TimeML annotations of TimeBank by adding information about the ... more We propose a way of enriching the TimeML annotations of TimeBank by adding information about the Topic Time in terms of Klein (1994). The annotations are partly automatic, partly inferential and partly manual. The corpus was converted into the native format of the annotation software GraphAnno and POS-tagged using the Stanford bidirectional dependency network tagger. On top of each finite verb, a FIN-node with tense information was created, and on top of any FIN-node, a TOPICTIME-node, in accordance with Klein's (1994) treatment of finiteness as the linguistic correlate of the Topic Time. Each TOPICTIME-node is linked to a MAKEINSTANCE-node representing an (instantiated) event in TimeML (Pustejovsky et al. 2005), the markup language used for the annotation of TimeBank. For such links we introduce a new category, ELINK. ELINKs capture the relationship between the Topic Time (TT) and the Time of Situation (TSit) and have an aspectual interpretation in Klein's (1994) theory. In...
On reciprocal and reflexive uses of anaphors in German and other European languages
Trends in Linguistics. Studies and Monographs [TiLSM]
Meaning and Grammar of Nouns and Verbs, 2021
IntroductionI t is a well-known fact that the vocabularies of individual languages are structured... more IntroductionI t is a well-known fact that the vocabularies of individual languages are structured very di erently. Even if it is always possible to translate a certain utterance from one language into another, it is rarely, if ever, possible to say that all or even some lexemes making up an utterance in one language correspond perfectly and completely to the lexemes rendering that utterance in another. In most cases the content cut out from the amorphous mass of notions and ideas by one lexeme A may be similar to the content identi ed by some translational counterpart in another, but there is hardly ever complete identity and what we nd is partial overlap at best. The consequence of this basic observation for structuralists was that semantic analysis in one language amounts to describing the structural relations between the lexemes of a language in terms of oppositions (antonymy, complementarity, converseness, etc.), super-and subordination, meronymy, etc. (cf. Lyons 1972, Cruse 1986, Löbner 2002, etc.), and that comparative semantics or comparative lexicology is nothing more than a comparison between these networks of structural relations. More recent theorizing about semantics, speci cally the idea of semantic decomposition in terms of hierarchical structures ("decompositional event seman-In the publications of Sebastian Löbner, to whom we dedicate this article on the occasion of his 65th birthday, comparative studies on lexicology and meaning have played a considerable role (see for instance Löbner 2002: 153,. or Löbner 2011). We would like to thank two anonymous reviewers for their critical comments and valuable suggestions.
The use of modal particles in German has been extensively studied since the late 1960s (e.g. Weyd... more The use of modal particles in German has been extensively studied since the late 1960s (e.g. Weydt 1969, 1977; Thurmair 1989; Helbig & Helbig 1993; cf. also the bibliography by Weydt & Ehlers 1987), and a large number of detailed studies dealing with specific particles are available (e.g. Burckhardt 1982; Doherty 1982; Borst 1985; Hentschel 1986; Lindner 1991; Meibauer 1993; Ormelius-Sandblom 1996; Rinas 2006 on ja and doch). It is not the objective of this paper to contribute to the pool of descriptive generalizations concerning these elements. Rather, the aim of the paper is to propose a model of utterance interpretation which allows us to regard the function of modal particles as an integral part of the interpretation process. Utterances are analyzed against the background of their ability to update discourse contexts (e.g. Stalnaker 1978; Heim 1982, 1983, 1992; Groenendijk & Stokhof 1991; Chierchia 1995), and modal particles are shown to interact with the process of context upda...
GraphAnno is a configurable tool for multi-level annotation which caters for the entire workflow ... more GraphAnno is a configurable tool for multi-level annotation which caters for the entire workflow from corpus import to data export and thus provides a suitable environment for the manual annotation of modals in their sentential contexts. Given its generic data model, it is particularly suitable for enriching existing corpora, e.g. by adding semantic annotations to syntactic ones. In this contribution, we present the functionalities of GraphAnno and make a concrete proposal for the treatment of modals in a corpus, with a focus on scope interactions. We have nothing to say about the specific categories to be annotated. Its generic design allows GraphAnno to be used with various annotation schemes, like those proposed by Hendrickx et al. (2012), Nissim et al. (2013) and Rubinstein et al. (2013). We will use generic category labels from theoretical linguistics for illustration purposes. After providing some background information on the tool in Section 2 we show how GraphAnno deals with...
Human Impersonal Pronoun Uses in English, Dutch and German
Leuvense Bijdragen - Leuven Contributions in Linguistics and Philology, 2012
The pronoun man derives from the homophonous noun meaning ‘man’. English had such a pronoun, but ... more The pronoun man derives from the homophonous noun meaning ‘man’. English had such a pronoun, but it disappeared in the 15 th century (Rissanen 1997: 517–521), so Modern English does not have a ‘man’ strategy for impersonal reference. Conversely, German, at least in the written register, very rarely uses a ‘you’ strategy of the type illustrated in (1). Dutch would seem to have both a ‘man’ and a ‘you’ strategy: 1
In this paper, we propose an annotation scheme for the manual annotation of tense and aspect in n... more In this paper, we propose an annotation scheme for the manual annotation of tense and aspect in natural language corpora, as well as an implementation using GraphAnno, a configurable tool for manual multilevel annotation. The annotation scheme is based on Klein’s (1994) theory of tense and aspect, arguably the most widely accepted theory in this domain (cf. also Klein and Li 2009). One of the most important features of Klein’s theory is that in addition to the time span during which a situation obtains (the ‘time of situation’/TSit), it makes use of the concept of ‘Topic Time’ (TT), which is related to, but different from, Reichenbach’s (1947) reference point ‘R’ (cf. Derczynski and Gaizauskas 2013). Given that the resulting annotations cannot be mapped one-to-one to words or constituents, and as they are partially retrieved from the context, a semantic layer of annotation is needed, in addition to the structural one. The multi-level approach advocated here also allows us to annotat...
ArXiv, 2020
This study investigates global properties of literary and non-literary texts. Within the literary... more This study investigates global properties of literary and non-literary texts. Within the literary texts, a distinction is made between canonical and non-canonical works. The central hypothesis of the study is that the three text types (non-literary, literary/canonical and literary/non-canonical) exhibit systematic differences with respect to structural design features as correlates of aesthetic responses in readers. To investigate these differences, we compiled a corpus containing texts of the three categories of interest, the Jena Textual Aesthetics Corpus. Two aspects of global structure are investigated, variability and self-similar (fractal) patterns, which reflect long-range correlations along texts. We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity in chunks of text, and (iv) the distribution of topic probabilities in chunks of texts. These basic observations are grouped into two more general cate...
Aspects of Linguistic Variation, 2018
Our study aims to explore how much information about areal patterns of colexification we can gain... more Our study aims to explore how much information about areal patterns of colexification we can gain from lexical databases such as CLICS and ASJP. We adopt a bottom-up (rather than hypothesis-driven) approach, identifying areal patterns in three steps: (i) determine spatial autocorrelations in the data, (ii) identify clusters as candidates for convergence areas and (iii) test the clusters resulting from the second step controlling for genealogical relatedness. Moreover, we identify a (genealogical) diversity index for each cluster. This approach yields promising results, which we regard as a proof of concept, but we also point out some drawbacks of the use of major lexical databases.
Scientific Data, 2020
Advances in computer-assisted linguistic research have been greatly influential in reshaping ling... more Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing stude...
Language Sciences, 2019
The history of either as a clause-final, right-periphery marker has seen little intensive researc... more The history of either as a clause-final, right-periphery marker has seen little intensive research, apart from a few isolated studies such as Rullmann (2002) and Gast (2013). This is surprising, given the recent interest in parenthetical discourse items and the controversies surrounding their development (grammaticalisation vs. pragmaticalization, and other debates). In the present study, it is first questioned whether right-periphery either (RP-either) could be categorized as a bona fide example of a discourse marker, and second, how a hypothesis emerged that 18 th and 19 th century prescriptivism motivated its sudden shift to become a postnegation, clause-final item, replacing the now non-standard, right-periphery neither (e.g. Jespersen 1917, Fitzmaurice and Smith 2012). The present study builds on the previous accounts, suggesting that the use of either as a clause-final additive focus marker had grammaticalised from a resumptive quantifier, post-posed in apposition and gradually renovating the former functions of clause-final neither in strong negative polarity contexts by a process of grammaticalization following co-optation (Heine 2013). The social stigmatization of right-periphery neither (RP neither) as an example of negative concord at the time must therefore have been due simply to its resulting association with recessive, dialectal or nonstandard usage, as RP-either rapidly increased its earlier range of functions to take over those of the ousted RP-neither in strong negative polarity contexts during the 19th century.
Beyond 'Any' and 'Ever', 2013
The English dual quantifier either has an intricate history. While it is commonly regarded as an ... more The English dual quantifier either has an intricate history. While it is commonly regarded as an existential quantifier with a distributional restriction to nonveridical (or 'non-affirmative', 'downward entailing', etc.) contexts in the modern language, its Old English precursor aeghwaeDer (contracted, aegDer) was a dual distributive universal quantifier, i.e. a quantifier meaning 'each of two'. This study investigates the processes of change leading from the universal quantifier of Old English to the nonveridical existential quantifier of Modern English. It is argued that this process was set in motion by the decline of another dual quantifier, OE awDer/ME outher 'one or other of two'. This quantifier was first replaced by either in combination with clause-internal nonveridical operators, where a wide-scope universal quantifier was equivalent to a narrow-scope existential quantifier (e.g. in interaction with a modal operator). Gradually 'absorbing' outher, either then extended its distribution further and came to be used in nonveridical contexts with a clause-external nonveridical operator as well (e.g. in conditional clauses). In such contexts either, which was still interpreted as a universal quantifier in veridical contexts, could only have a universal reading when interpreted with extra-clausal, i.e. exceptional, scope. Such exceptional scope behaviour, in conjunction with the rise of a competing universal quantifier in veridical contexts (both[e]), led to the reanalysis of either as a nonveridical existential quantifier, which thus acquired the distribution that it has in present-day English. The paper is intended as a case study on the interaction of lexical content, scope and polarity properties in the genesis of a polarity-sensitive operator, as well as the role of competition between (near) equivalent expressions in diachronic change.
Scalar additive operators: Typology and historical development
Human impersonal pronouns in populist discourse
DESCRIPTION Invited talk at workshop "Interdisciplinary perspectives on populist discourse i... more DESCRIPTION Invited talk at workshop "Interdisciplinary perspectives on populist discourse in Germany and Poland", August 20-21, 2015
Towards a distributional typology of human impersonal pronouns, based on data from European languages
Languages Across Boundaries, 2013
Human impersonal pronouns like French on and German man are regarded as pronouns that are used to... more Human impersonal pronouns like French on and German man are regarded as pronouns that are used to fill an argument position with a variable ranging over human referents without establishing a referential link to an entity from the universe of discourse. Such pronouns are highly context-dependent and variable in their distributional and semantic properties. Following up on work done by Anna Siewierska, we aim to capture this variability by using the semantic map methodology. We propose a mathematical (graph-theoretic) definition of ‘connectvity maps’ in general and devise a map for human impersonal pronouns or, more generally speaking, the ‘impersonalization’ of argument positions. The map is intended as a hypothesis about possible patterns of polysemy in the domain of investigation, and is tested on the basis of a small sample of European languages.
Journal of Pragmatics, 2015
Pronominal and verbal forms of the second person singular are canonically used with personal refe... more Pronominal and verbal forms of the second person singular are canonically used with personal reference, i.e., as referring (exclusively) to the addressee. In what is often called 'impersonal' uses, the range of reference is broadened from the addressee to a more comprehensive set of referents, and sometimes the relevant sentences are not literally speaking true, as properties are attributed to the addressee which (s)he does not actually have. The question arises whether impersonally used forms of the second person singular constitute a grammatical category of their own, or whether they exhibit the same (underlying) semantics as canonical uses of the second person. On the basis of a dynamic-inferential view of communication, we argue for a unified analysis of personal and impersonal second person forms. Effects of generalization are claimed to emerge in sentences which are generalizing independently of the occurrence of a second person form. Uses of the second person that lead to truth-conditionally false sentences are claimed to involve (an invitation to) simulation and the creation of empathy. According to this analysis, impersonal uses of the second person establish a direct referential link to the addressee, just like personal uses, and their status as 'impersonal' is a function of sentential contexts and conversational conditions.
This contribution presents a comparative, corpus-based study of three concessive subordinators fr... more This contribution presents a comparative, corpus-based study of three concessive subordinators from English, German and Spanish, i.e., 'although', 'obwohl' and 'aunque'. It investigates differences in the distribution of these operators on the basis of richly annotated data from the Europarl corpus, taking into consideration structural properties of the relevant clauses (position and length), the type of semantic relation holding between the main clause and the concessive clause, the level of linking (propo-sitional, illocutionary, textual), and information structural parameters (status of the concessive clause, topic-comment structure). The study shows that there are significant differences between German obwohl on the one hand, and English although and Spanish aunque, on the other. German obwohl is more restricted in its distribution and mostly occurs in 'canonical' concessives, while English although and Spanish aunque are used in a broader range of contexts, beyond the canonical distributions (e.g. in 'restrictive' uses, and at a textual level). This observation is related to the fact that German, unlike English and Spanish, uses a specific type of word order in subordinate clauses, which seems to block distributional extensions, at least in the formal register investigated in the present study (political speech).