Two parsing algorithms by means of finite state transducers (original) (raw)

Context-free parsing with finite-state transducers

Proceedings of the 3rd South American Workshop on String Processing, 1996

This article is a study of an algorithm designed and implemented by Roche for parsing natural language sentences according to a context-free grammar. This algorithm is based on the construction and use of a finite-state transducer. Roche successfully applied it to a context-free grammar with very numerous rules. In contrast, the complexity of parsing words according to context-free grammars is usually considered in practice as a function of one parameter: the length of the input sequence; the size of the grammar is generally taken to be a constant of a reasonable value. In this article, we first explain why a context-free grammar with a correct lexical and grammatical coverage is bound to have a very large number of rules and we review work related with this problem. Then we exemplify the principle of Roche's algorithm on a small grammar. We provide formal definitions of the construction of the parser and of the operation of the algorithm and we prove that the parser can be built for a large class of context-free grammars, and that it outputs the set of parsing trees of the input sequence.

Efficient parsing with finite-state constraint satisfaction, a Ph. D. project

2002

1. Background My Ph. D. project in the Department of General Linguistics at the University of Helsinki started in 2002, under the title" Efficient parsing with finite-state constraint satisfaction". The research concerns a specific finite-state method for parsing, and its goal is to increase practical value and flexibility of the method. In this article, I present a streamlined version of my Ph. D. research plan.

Compiling and Using Finite-State Syntactic Rules

1992

A language-independent framework for syntactic finlte-state parsing is discussed. The article presents a framework, a formalism, a compiler and a parser for grammars written in this forrealism. As a substantial example, fragments from a nontrivial finite-state grammar of English are discussed.

Refining the Design of a Contracting Finite-State Dependency Parser

2012

Abstract This work complements a parallel paper of a new finite-state dependency parser architecture (Yli-Jyrä, 2012) by a proposal for a linguistically elaborated morphology-syntax interface and its finite-state implementation. The proposed interface extends Gaifman's (1965) classical dependency rule formalism by separating lexical word forms and morphological categories from syntactic categories.

Comparing and combining finite-state and context-free parsers

Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05, 2005

Abstract In this paper, we look at comparing high-accuracy context-free parsers with high-accuracy finite-state (shallow) parsers on several shallow parsing tasks. We show that previously reported compar-isons greatly under-estimated the perfor-mance of context-free parsers for these ...

Finite-state phrase parsing by rule sequences

1996

We present a novel approach to parsing phrase grammars based on Eric Brill's notion of rule sequences. The basic framework we describe has somewhat less power than a finite-state machine, and yet achieves high accuracy on standard phrase parsing tasks. The rule language is simple, which makes it easy to write rules. Further, this simplicity enables the automatic acquisition of phraseparsing rules through an error-reduction strategy.

Introduction to Finite-State Devices in Natural Language Processing

1996

The theory of nite-state automata (FSA) is rich and nite-state automata techniques have been used in a wide range of domains, such as switching theory, pattern matching, pattern recognition, speech processing, hand writing recognition, optical character recognition, encryption algorithm, data compression, indexing and operating system analysis (Petri-net). Finite-State devices such as Finite-State Automata, Graphs and FiniteState Transducers have been known since the emergence of Computer Science and are extensively used in areas as various as program compilation, hardware modeling or database management. In Computational Linguistics, although they were known for a long time, more powerful formalisms such as contextfree grammars or uni cation grammars have been preferred. However, recent mathematical and algorithmic results in the eld of nite-state technology have had a great impact on the representation of electronic dictionaries and natural language processing. As a result, a new ...

Extended finite state models of language

1999

In spite of the wide availability of more powerful (context free, mildly context sensitive, and even Turing-equivalent) formalisms, the bulk of the applied work on language and sublanguage modeling, especially for the purposes of recognition and topic search, is still performed by various finite state methods. In fact, the use of such methods in research labs as well as in applied work actually increased in the past five years. To bring together those developing and using extended finite state methods to text analysis, speech/OCR language modeling, and related CL and NLP tasks with those in AI and CS interested in analyzing and possibly extending the domain of finite state algorithms, a workshop was held in August 1996 in Budapest as part of the European Conference on Artificial Intelligence (ECAI'96). the web. JNLE readers whose interest in the subject of finite state technologies is aroused by this issue are advised to look at these proceedings, since they contain several excellent papers that could not be included here because of space constraints or because the authors felt that their subsequent work took a direction that they no longer consider the workshop paper fully representative of their current thinking. In particular, we call attention to the tutorial paper by Jelinek (excerpted from a his forthcoming book (Jelinek 1977)), the paper by Mohri, Pereira, and Riley describing the AT&T/Bell Labs approach to language modeling using weighted transducers, and the paper by Oehrle on binding and anaphora.

Finite-State Parsing And Disambiguation

1990

A language-independent method of finitestate surface syntactic parsing and word-disambiguation is discussed. Input sentences are represented as finite-state networks already containing all possible roles and interpretations of its units. Also syntactic constraint rules are represented as finite-state machines where each constraint excludes certain types of ungrammatical readings. The whole grammar is an intersection of its constraint rules and excludes all ungrammatical possibilities leaving the correct interpretation(s) of the sentence. The method is being tested for Finnish, Swedish and English.