Linguistic Theory and Grammar Implementation: Introduction to this Special Issue (original) (raw)

German Treebanks: TIGER and TüBa-D/Z

2017

German is a language that is closely related to English but has a richer morphology and freer word order than English. Additionally, German has four existing major treebanks, which differ considerably in their syntactic annotation schemes. All treebanks use a combination of constituent structure and grammatical functions, but the decisions with regard to other phenomena differ significantly, for example in the treatment of discontinuous structures. This makes German a good choice for a comparative analysis of treebanks. This chapter presents two major treebanks of German, TIGER and TuBa-D/Z. We describe the projects in which the two treebanks were annotated, discuss the respective annotation schemes, the processes used for annotation, and the data formats. We also discuss the usage of both treebanks, as well as other German treebanks, and we present a comparison of the two annotation schemes along with their advantages and disadvantages.

A Linguistically Interpreted Corpus of German Newspaper Text

1998

In this paper, we report on the development of an annotation scheme and annotation tools for unrestricted Ger-man text. Our representation format is based on argument structure, but also permits the extraction of other kinds of representations. We discuss several methodolog-ical issues and the analysis of some phenomena. Additional focus is on the tools developed in our project and their applications.

Projecting LFG F-structures from Chunks

2003

Abstract In this paper we pursue two related goals: First, we establish a conceptual link between chunkbased syntactic structures as typically assumed in shallow parsing approaches, as opposed to principle-based syntactic structures as assumed in theoretical linguistics research. This conceptual link emerges from the study of configurational vs.

Treebank Conversion. Converting the NEGRA treebank to an LTAG grammar

2001

Abstract We present a method for rule-based structure conversion of existing treebanks, which aims at the extraction of linguistically sound, corpus-based grammars in a specific grammatical framework. We apply this method to the NEGRA treebank to derive an LTAG grammar of German. We describe the methodology and tools for structure conversion and LTAG extraction. The conversion and grammar extraction process imports linguistic generalisations that are missing the in original treebank.

Ancora: Multilingual and multilevel annotated corpora

2007

To have at our disposal linguistic resources with morphosyntactic and semantic information, either lexicons or tagged corpora, appears to be an obvious necessity for most –if not all– natural language processing (NLP) applications. Furthermore, annotated corpora also constitute a crucial resource to acquire or infer linguistic knowledge about how languages are used. In this line, it is widely accepted that linguistically annotated corpora are a very useful resource for computational and linguistic analysis of languages.