The lexico-semantic annotation of an Italian Treebank (original) (raw)

Building the Italian Syntactic-Semantic Treebank

The paper reports on the design and construction of a multi-layered corpus of Italian, annotated at the syntactic and lexico-semantic levels, whose development is supported by dedicated software augmented with an intelligent interface. The issue of evaluating this type of resource is also addressed.

Comparing linguistic information in treebank annotations

The paper investigates the issue of portability of methods and results over treebanks in different languages and annotation formats. In particular, it addresses the problem of converting an Italian treebank, the Turin University Treebank (TUT), developed in dependency format, into the Penn Treebank format, in order to possibly exploit the tools and methods already developed and compare the adequacy of information encoding in the two formats. We describe the procedures for converting the two annotation formats and we present an experiment that evaluates some linguistic knowledge extracted from the two formats, namely sub-categorization frames.

Enriching the Venice Italian Treebank with dependency and grammatical relations

In this paper we propose a rule-based approach to extract dependency and grammatical relations from the Venice Italian Treebank (VIT) with bracketed tree structure. To our knowledge, the only dependency annotated corpus for Italian available is the Turin University Treebank , which has 25,000 tokens and is about 1/10 of VIT. As manual corpus annotation is expensive and time-consuming, we decided to exploit an existing constituency-based treebank, the VIT, to derive dependency structures with lower effort. After describing the procedure to extract heads and dependents, based on a head percolation table for Italian, we introduce the rules adopted to add grammatical relation labels. To this purpose, we manually relabeled all non-canonical arguments, which are very frequent in Italian, then we automatically labeled the remaining complements or arguments following some syntactic restrictions based on the position of the constituents w.r.t to parent and sibling nodes. The final section of the paper describes evaluation results, carried out in two steps, one for dependency relations and one for grammatical roles. Since results are promising, we plan to use the dependency treebank to train a dependency-based parser and eventually a semantic role labelling system.

Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank

2013

The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging to the same dependency–based family, the Italian Stanford Dependency Treebank (ISDT), and an Italian localization of the Stanford Dependency scheme.

VIT Venice Italian Treebank: Syntactic and quantitative features

2007

In this paper we will describe VIT (Venice Italian Treebank), created at the University of Venice. We will focus on the syntactic-semantic features and on the quantitative analysis of the data of our treebank comparing them to other treebanks. In general, we will try to substantiate the claim that treebanking grammars or parsers is dramatically dependent on the chosen treebank; and eventually this process seems to be dependent either from substantial factors such as the adopted linguistic framework for structural description or, ultimately, the described language.

Harmonization and Merging of two Italian Dependency Treebanks

2012

The paper describes the methodology which is currently being defined for the construction of a "Merged Italian Dependency Treebank" (MIDT) starting from already existing resources. In particular, it reports the results of a case study carried out on two available dependency treebanks, i.e. TUT and ISST-TANL. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed and investigated with a particular emphasis on the definition of a set of linguistic categories to be used as a "bridge" between the specific schemes. As an encoding format, the CoNLL de facto standard is used.

Dependency and relational structure in treebank annotation

… of Workshop on Recent Advances in …, 2004

Among the variety of proposals currently making the dependency perspective on grammar more concrete, there are several treebanks whose annotation exploits some form of Relational Structure that we can consider a generalization of the fundamental idea of dependency at various degrees and with reference to different types of linguistic knowledge. The paper describes the Relational Structure as the common underlying representation of treebanks which is motivated by both theoretical and task-dependent considerations. Then it presents a system for the annotation of the Relational Structure in treebanks, called Augmented Relational Structure, which allows for a systematic annotation of various components of linguistic knowledge crucial in several tasks. Finally, it shows a dependency-based annotation for an Italian treebank, i.e. the Turin University Treebank, that implements the Augmented Relational Structure.

Treebank Annotation in the Light of the Meaning-Text Theory

Linguistic Issues in Language Technology, 2012

A treebank may contain the annotation of different phenomena such as word order, morphological features, syntactic and semantic relations, etc., which are rather different in their nature. Quite often, the annotation of these phenomena is combined in a single structure, which leads to low-quality training results and is verifiably deficient from a theoretical (linguistic) perspective. We argue that the annotation of corpora requires a well-defined linguistic model which supports multi-level annotation, with one type of phenomenon per level. Our experience with dependency treebanks created or adjusted for surface-oriented natural language generation and based on the Meaning-Text Theory, a multi-level linguistic model, supports this argumentation.