Use of Weighted Finite State Transducers in Part of Speech Tagging (original) (raw)

Combining Statistical Models for POS Tagging using Finite-State Calculus

2011

We introduce a framework for POS tagging which can incorporate a variety of different information sources such as statistical models and hand-written rules. The information sources are compiled into a set of weighted finite-state transducers and tagging is accomplished using weighted finite-state algorithms. Our aim is to develop a fast and flexible way for trying out different tagger designs and combining them into hybrid systems. We test the applicability of the framework by constructing HMM taggers with augmented lexical models for English and Finnish. We compare our taggers with two existing statistical taggers TnT and Hunpos and find that we achieve superior accuracy.

Language model combination and adaptation usingweighted finite state transducers

2010

In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaptation may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequences.

Two parsing algorithms by means of finite state transducers

Proceedings of the 15th conference on Computational linguistics -, 1994

We present a new apl)roach , ilhlstrated by two algo-rithms> for parsing not only Finite SI.ate (:Iranlnlars but also Context Free Grainlnars and their extension, by means of finite state machines. '/'he basis is the computation of a flxed point of a linite-state function, i.e. a finite-state transducer. Using these techniques, we have built a program that parses French sentences with a gramnlar of more than 200>000 lexical rules with a typical response time of less than a second. The tirst algorithm computes a fixed point of a non-deterluinistic tinite-state transducer and the second coniplites a lixed point of a deterministic bidirectiollal device called a bimachine. These two algoril;hms point out a new connection between the theory of parsing and the theory of representation of rational transduetions.

A Layered, Transducer-Based Model for Speech and Language Processing

2008

This paper presents a transducer-based model for speech and language processing. The proposed model consists of a series of layers of interconnected transducers. At the lower layer there is a Finite-State Transducer (FST) containing the lexicon of the system. At the next layer another FST represents the language model. Then a word-to-POS transducer is used to provide a link between the graphemic form of the word and the part-of-speech tag associated with it. The upper layer is composed of a transducer, which utilises the POS information of the previous layer to form syntactic structures based on context-free grammatical rules. Transition probabilities are also considered thus forming Weighted Finite-State Transducers (WFSTs). Keeping these probabilities outside the grammatical information allows their independent composition. The grammatical component can be composed from existing carefully built lexicons, language models and syntactic rules, while the probabilities can be derived automatically from corpora afterwards and be applied as additional information to the existing structure. A set of on-line algorithms for rapid update of automata and transducers has been developed to support this approach.

On the use of finite state transducers for semantic interpretation

Speech Communication, 2006

A spoken language understanding (SLU) system is described. It generates hypotheses of conceptual constituents with a translation process. This process is performed by finite state transducers (FST) which accept word patterns from a lattice of word hypotheses generated by an Automatic Speech Recognition (ASR) system. FSTs operate in parallel and may share word hypotheses at their input. Semantic hypotheses are obtained by composition of compatible translations under the control of composition rules. Interpretation hypotheses are scored by the sum of the posterior probabilities of paths in the lattice of word hypotheses supporting the interpretation. A compact structured n-best list of interpretation is obtained and used by the SLU interpretation strategy.

Guessers for Finite-State Transducer Lexicons

Computational Linguistics and Intelligent Text …, 2009

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons.

Introduction to Finite-State Devices in Natural Language Processing

1996

The theory of nite-state automata (FSA) is rich and nite-state automata techniques have been used in a wide range of domains, such as switching theory, pattern matching, pattern recognition, speech processing, hand writing recognition, optical character recognition, encryption algorithm, data compression, indexing and operating system analysis (Petri-net). Finite-State devices such as Finite-State Automata, Graphs and FiniteState Transducers have been known since the emergence of Computer Science and are extensively used in areas as various as program compilation, hardware modeling or database management. In Computational Linguistics, although they were known for a long time, more powerful formalisms such as contextfree grammars or uni cation grammars have been preferred. However, recent mathematical and algorithmic results in the eld of nite-state technology have had a great impact on the representation of electronic dictionaries and natural language processing. As a result, a new ...

Applying Transducers To Spoken Language Processing For Portuguese

2002

This paper has two different goals. The primary aim is to illustrate the advantages of weighted finite state transducers for spoken language processing, namely in terms of their capacity to efficiently integrate different types of knowledge sources. We have chosen three areas to emphasize several aspects of the application of transducers: large vocabulary continuous speech recognition, automatic alignment and grapheme-to-phone conversion. The secondary goal is to simultaneously present the state of the art in these areas for European Portuguese.

Multilevel annotation of speech signals using Weighted Finite State Transducers

2002

The purpose of this work was the development of a set of tools to automate the process of multilevel annotation of speech signals, preserving the alignments of the utterance's different levels of the linguistic representation. Our goal is to build speech databases, using speech from non professional speakers with multilevel relational annotations, that can be used for the development of concatenative-based textto-speech synthesizers or for training and testing statistical models. The method is based on the linguistic analysis of the transcription of the spoken material performed by a TTS system. The predicted phone sequence is then compared with the sequence produced by the speaker. The problem of aligning these two sequences is solved in a languageindependent way using Weighted Finite State Transducers. After the alignment, a re-synchronization procedure is applied to the remaining levels to put them in agreement with the spoken utterance.