A bootstrapping approach to parser development (original) (raw)

A cascaded syntactic analyser for Basque

… and Intelligent Text …, 2004

This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML. CG Morphosyntactic parsing Syntactic tagging Chunker Dependencies EUSLEM Morpheus Disambiguation using linguistic information Disambiguation using statistical information Shallow syntactic parsing Named Entities % CG Postpositions CG xfst Noun and verb chains CG Tagging of syntactic dependencies CG

bRol: The Parser of Syntactic and Semantic Dependencies for Basque

This paper presents bRol, the first fully automatic system to be developed for the parsing of syntactic and semantic dependencies in Basque. The parser has been built according to the settings established for the CoNLL-2009 Shared Task (Hajič et al., 2009), therefore, bRol can be thought of as a standard parser with scores comparable to the ones reported in the shared task. A second-order graph-based MATE parser has been used as the syntactic dependency parser. The semantic model, on the other hand, uses the traditional four-stage SRL pipeline. The system has a labeled attachment score of 80.51%, a labeled semantic F 1 of 75.10, and a labeled macro F 1 of 77.80.

Syntactic parsing of unrestricted Spanish text

1998

This research focusses on the syntactical parsing of morphologycal tagged corpora. A proposal for a corpus oriented Spanish grammar is presented in this document. This work has been developed in the framework of the ITEM project and its main goal is to provide multilingual background for information extraction and retrieval tasks. The main goal of Tacat analyser is to provide a way of obtaining large amounts of bracketed and parsed corpora, both general and specific domain. Tacat uses context free grammars and has as input following categories of Parole specification.The incremental methodology that we use allows us to recognise different levels of complexity in the analysis and to produce compatible outputs of all the grammars.

Towards a Dependency Parser for Basque

2004

We present the Dependency Parser, called Maxuxta, for the linguistic processing of Basque, which can serve as a representative of agglutinative languages that are also characterized by the free order of it s constituents. The Dependency syntactic model is applied to establish the dependency-based grammatical relations between the components within the clause. Such a deep analysis is used to improve

Application of finite-state transducers to the acquisition of verb subcategorization information

Natural Language Engineering, 2003

This paper presents the design and implementation of a finite-state syntactic grammar of Basque that has been used with the objective of extracting information about verb subcategorization instances from newspaper texts. After a partial parser has built basic syntactic units such as noun phrases, prepositional phrases, and sentential complements, a finite-state parser performs syntactic disambiguation, determination of clause boundaries and filtering of the results, in order to obtain a verb occurrence together with its associated syntactic components, either complements or adjuncts. The set of occurrences for each verb is then filtered by statistical measures that distinguish arguments from adjuncts.

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

Corpus Linguistics and Linguistic Theory, 2009

In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.

Application of different techniques to dependency parsing of Basque

Proceedings of the NAACL HLT 2010 …, 2010

ii Foreword The idea of organizing this workshop was sparked following very interesting discussions that occurred during EACL09 among various researchers working on statistical parsing of different types of languages. Indeed, an opportunity to discuss the issues that we were all experiencing was much needed, and it seemed such a good idea that we decided to take advantage of IWPT'09, which was held that year in Paris, to organize a panel on this topic. We planned to have presentations on the various issues faced by this small emerging community, which would allow us to share our sometimes similar solutions for parsing different languages.

A bootstrapping approach to parser development (original) (raw)

Related papers