Michael B Maxwell - Academia.edu (original) (raw)

Papers by Michael B Maxwell

Research paper thumbnail of The subject and infinitival complementation in English

Research paper thumbnail of Invited Talk: Building Language Resources: Ways to move forward

There are perhaps seven thousand languages in the world, ranging from the largest with hundreds o... more There are perhaps seven thousand languages in the world, ranging from the largest with hundreds of millions of speakers, to the smallest, with one speaker. On a different axis, languages can be ranked according to the quantity and quality of computational resources. Not surprisingly, there are correlations between these two axes: languages like English and Mandarin have substantial resources, while many of the smallest languages are completely undocumented. Nevertheless, the correlation is not perfect; there are languages with a million speakers which are more or less unwritten, and there are very large languages – some of the languages of India, for example – which are relatively resource-poor. Unfortunately, what counts as resource-rich (or even resource-adequate) in computational linguistics is a moving target. For languages to move in the direction of resource richness, considerable effort (people and money) have to be provided over a prolonged period of time. One can sit back a...

Research paper thumbnail of Morphological interfaces to dictionaries

Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries - ElectricDict '04, 2004

Languages with complex morphologies present difficulties for dictionaries users. One solution to ... more Languages with complex morphologies present difficulties for dictionaries users. One solution to this problem is to use a morphological parser for lookup of morphologically complex words, including fully inflected words, without the user needing to explicitly know the morphology. We discuss the sorts of morphologies which cause the greatest need for such an interface.

Research paper thumbnail of STREAMLInED Challenges: Aligning Research Interests with Shared Tasks

Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, 2017

While there have been significant improvements in speech and language processing, it remains diff... more While there have been significant improvements in speech and language processing, it remains difficult to bring these new tools to bear on challenges in endangered language documentation. We describe an effort to bridge this gap through Shared Task Evaluation Campaigns (STECs) by designing tasks that are compelling to speech and natural language processing researchers while addressing technical challenges in language documentation and exploiting growing archives of endangered language data. Based on discussions at a recent NSF-funded workshop, we present overarching design principles for these tasks: including realistic settings, diversity of data, accessibility of data and systems, and extensibility, that aim to ensure the utility of the resulting systems. Three planned tasks embodying these principles are highlighted: spanning audio processing, orthographic regularization, and automatic production of interlinear glossed text. The planned data and evaluation methodologies are also presented, motivating each task by its potential to accelerate the work of researchers and archivists working with endangered languages. Finally, we articulate the interest of the tasks to both speech and NLP researchers and speaker communities.

Research paper thumbnail of Interoperable Grammars

For languages with significant inflectional morphology, development of a morphological parser is ... more For languages with significant inflectional morphology, development of a morphological parser is often a prerequisite to further computational linguistic capabilities. We focus on two difficulties for this development: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have developed to promote portability, using a formal declarative grammar written in XML, which we supplement with a traditional descriptive grammar. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine’s programming language, thus helping solve the software lifetime and portability problems

Research paper thumbnail of Interoperable Grammars

For languages with significant inflectional morphology, development of a morpho-logical parser is... more For languages with significant inflectional morphology, development of a morpho-logical parser is often a prerequisite to fur-ther computational linguistic capabilities. We focus on two difficulties for this devel-opment: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have de-veloped to promote portability, using a for-mal declarative grammar written in XML, which we supplement with a traditional de-scriptive grammar. The two grammars are combined into a single document using

Research paper thumbnail of Interoperable Grammars

For languages with significant inflectional morphology, development of a morphological parser is ... more For languages with significant inflectional morphology, development of a morphological parser is often a prerequisite to further computational linguistic capabilities. We focus on two difficulties for this development: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have developed to promote portability, using a formal declarative grammar written in XML, which we supplement with a traditional descriptive grammar. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine’s programming language, thus helping solve the software lifetime and portability problems. 1 Grammar Development After decades of widespread effort in computational linguistics, it is clear that progress has been made in areas ranging from the building computational lexical resources, to applications such as machine t...

Research paper thumbnail of Accounting for Allomorphy in Finite-state Transducers

Building morphological parsers with existing finite state toolkits can result in something of a m... more Building morphological parsers with existing finite state toolkits can result in something of a mis-match between the programming language of the toolkit and the linguistic concepts familiar to the average linguist. We illustrate this mismatch with a particular linguistic construct, suppletive allomorphy, and discuss ways to encode suppletive allomorphy in the Stuttgart Finite State tools (sfst). The complexity of the general solution motivates our work in providing an alternative formalism for morphology and phonology, one which can be translated automatically into sfst or other morphological parsing engines.

Research paper thumbnail of Book Reviews: A Grammar Writer's Cookbook

ABSTRACT Introduction Grammar writers sometimes approach grammar writing as if the language being... more ABSTRACT Introduction Grammar writers sometimes approach grammar writing as if the language being described were the only language in the world. In contrast, this book reports on the parallel development of computational grammars for three languages: English, French, and German. At the time the book was written, the "ParGram" (Parallel Grammars) project included researchers from the Xerox Palo Alto Research Center (California), the Xerox Research Centre Europe (Grenoble), and the Institut ftir Maschinelle Sprachverarbeitung (University of Stuttgart). The theoretical approach is Lexical-Functional Grammar (LFG), a theory well suited to parallel development, in that it assumes two levels of grammatical representation: "c(onstituent)-structure" is the traditional phrase-structure analysis, while "f(unctional)-structure" is a representation of argument structure (a surfacy kind of semantic representation). While the c-structures of analogous sentences in two languages may differ substantially,

Research paper thumbnail of Parsing Using Linearly Ordered Phonological Rules

A generate and test algorithm is described which parses a surface form into one or more lexical e... more A generate and test algorithm is described which parses a surface form into one or more lexical entries using linearly ordered phonological rules. This algorithm avoids the exponential expansion of search space which a naive parsing algorithm would face by encoding into the form being parsed the ambiguities which arise during parsing. The algorithm has been implemented and tested on real language data, and its speed compares favorably with that of a KIMMO-type parser. * I have benefited from comments on previous versions of this paper by Alan Buseman, and several anonymous referees. Errors remain my own.

Research paper thumbnail of A System for Archivable Grammar Documentation

Research paper thumbnail of Joint Grammar Development by Linguists and Computer Scientists

For languages with inflectional morphology, development of a morphological parser can be a bottle... more For languages with inflectional morphology, development of a morphological parser can be a bottleneck to further development. We focus on two difficulties: first, finding people with expertise in both computer programming and the linguistics of a particular language, and second, the short lifetime of software such as parsers. We describe a methodology to split parser building into two tasks: descriptive grammar development, and formal grammar development. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine's programming language, so that it can be readily ported to a new parsing engine, thus helping solve the software lifetime problem.

Research paper thumbnail of Electronic Grammars and Reproducible Research

Research paper thumbnail of Frontiers in linguistic annotation for lower-density languages

Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 - LAC '06, 2006

The languages that are most commonly subject to linguistic annotation on a large scale tend to be... more The languages that are most commonly subject to linguistic annotation on a large scale tend to be those with the largest populations or with recent histories of linguistic scholarship. In this paper we discuss the problems associated with lowerdensity languages in the context of the development of linguistically annotated resources. We frame our work with three key questions regarding the definition of lower-density languages; increasing available resources and reducing data requirements. A number of steps forward are identified for increasing the number lowerdensity language corpora with linguistic annotations.

Research paper thumbnail of Spoken Language Characterization

Springer Handbook of Speech Processing, 2008

This chapter describes the types of information that can be used to characterize spoken languages... more This chapter describes the types of information that can be used to characterize spoken languages. Auto-matic spoken language identification (LID) systems, which are tasked with determining the identity of the language of speech samples, can utilize a variety of ...

Research paper thumbnail of Review of A grammar writer's cookbook by Miriam Butt, Tracy Holloway King, María-Eugenia Niño, and Frédérique Segond. Cambridge University Press 1999

Computational Linguistics, 2000

Grammar writers sometimes approach grammar writing as if the language being described were the on... more Grammar writers sometimes approach grammar writing as if the language being described were the only language in the world. In contrast, this book reports on the parallel development of computational grammars for three languages: English, French, and ...

Research paper thumbnail of Phonological analysis and opaque rule orders

Proceedings of the Second International Workshop on …, 1991

... Michael Maxwell. ... Phonological rules are written in the "standard" way with dist... more ... Michael Maxwell. ... Phonological rules are written in the "standard" way with distinctive features but without any abbreviatory conventions (parentheses, curly braces, angled brackets, alpha variables, etc.); the rules apply in linear order, the output of each serving as the input to the ...

Research paper thumbnail of Phonological Knowledge: Conceptual and Empirical Issues (review)

Language, 2002

... This is again an empirical question, as the authors point out in the end. JENNIFER FITZPATRIC... more ... This is again an empirical question, as the authors point out in the end. JENNIFER FITZPATRICK and LINDA R. WHEELDON propose a model of speech perception, claim-ing that (contrastive) underspecification makes for more 'robust' recognition. ...

Research paper thumbnail of Morphological Analysis in Comparison (review)

Language, 2003

... These inverse forms are homophonous with antipassives, which Spencer takes to be a syncretism... more ... These inverse forms are homophonous with antipassives, which Spencer takes to be a syncretism, supporting the need for rules of referral. ... Vol. 2: His-tory, theory, and policy. Pp. 414. ISBN 0805840540. $28.95. Ed. by ROSEANN DUEN˜ AS GONZA´ LEZ, with ILDIKO´ MELIS. ...

Research paper thumbnail of Trubetzkoy's Orphan: Proceedings of the Montreal Roundtable 'Morphonology: Contemporary Responses

Research paper thumbnail of The subject and infinitival complementation in English

Research paper thumbnail of Invited Talk: Building Language Resources: Ways to move forward

There are perhaps seven thousand languages in the world, ranging from the largest with hundreds o... more There are perhaps seven thousand languages in the world, ranging from the largest with hundreds of millions of speakers, to the smallest, with one speaker. On a different axis, languages can be ranked according to the quantity and quality of computational resources. Not surprisingly, there are correlations between these two axes: languages like English and Mandarin have substantial resources, while many of the smallest languages are completely undocumented. Nevertheless, the correlation is not perfect; there are languages with a million speakers which are more or less unwritten, and there are very large languages – some of the languages of India, for example – which are relatively resource-poor. Unfortunately, what counts as resource-rich (or even resource-adequate) in computational linguistics is a moving target. For languages to move in the direction of resource richness, considerable effort (people and money) have to be provided over a prolonged period of time. One can sit back a...

Research paper thumbnail of Morphological interfaces to dictionaries

Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries - ElectricDict '04, 2004

Languages with complex morphologies present difficulties for dictionaries users. One solution to ... more Languages with complex morphologies present difficulties for dictionaries users. One solution to this problem is to use a morphological parser for lookup of morphologically complex words, including fully inflected words, without the user needing to explicitly know the morphology. We discuss the sorts of morphologies which cause the greatest need for such an interface.

Research paper thumbnail of STREAMLInED Challenges: Aligning Research Interests with Shared Tasks

Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, 2017

While there have been significant improvements in speech and language processing, it remains diff... more While there have been significant improvements in speech and language processing, it remains difficult to bring these new tools to bear on challenges in endangered language documentation. We describe an effort to bridge this gap through Shared Task Evaluation Campaigns (STECs) by designing tasks that are compelling to speech and natural language processing researchers while addressing technical challenges in language documentation and exploiting growing archives of endangered language data. Based on discussions at a recent NSF-funded workshop, we present overarching design principles for these tasks: including realistic settings, diversity of data, accessibility of data and systems, and extensibility, that aim to ensure the utility of the resulting systems. Three planned tasks embodying these principles are highlighted: spanning audio processing, orthographic regularization, and automatic production of interlinear glossed text. The planned data and evaluation methodologies are also presented, motivating each task by its potential to accelerate the work of researchers and archivists working with endangered languages. Finally, we articulate the interest of the tasks to both speech and NLP researchers and speaker communities.

Research paper thumbnail of Interoperable Grammars

For languages with significant inflectional morphology, development of a morphological parser is ... more For languages with significant inflectional morphology, development of a morphological parser is often a prerequisite to further computational linguistic capabilities. We focus on two difficulties for this development: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have developed to promote portability, using a formal declarative grammar written in XML, which we supplement with a traditional descriptive grammar. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine’s programming language, thus helping solve the software lifetime and portability problems

Research paper thumbnail of Interoperable Grammars

For languages with significant inflectional morphology, development of a morpho-logical parser is... more For languages with significant inflectional morphology, development of a morpho-logical parser is often a prerequisite to fur-ther computational linguistic capabilities. We focus on two difficulties for this devel-opment: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have de-veloped to promote portability, using a for-mal declarative grammar written in XML, which we supplement with a traditional de-scriptive grammar. The two grammars are combined into a single document using

Research paper thumbnail of Interoperable Grammars

For languages with significant inflectional morphology, development of a morphological parser is ... more For languages with significant inflectional morphology, development of a morphological parser is often a prerequisite to further computational linguistic capabilities. We focus on two difficulties for this development: the short lifetime of software such as parsing engines, and the difficulty of porting grammars to new parsing engines. We describe a methodology we have developed to promote portability, using a formal declarative grammar written in XML, which we supplement with a traditional descriptive grammar. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine’s programming language, thus helping solve the software lifetime and portability problems. 1 Grammar Development After decades of widespread effort in computational linguistics, it is clear that progress has been made in areas ranging from the building computational lexical resources, to applications such as machine t...

Research paper thumbnail of Accounting for Allomorphy in Finite-state Transducers

Building morphological parsers with existing finite state toolkits can result in something of a m... more Building morphological parsers with existing finite state toolkits can result in something of a mis-match between the programming language of the toolkit and the linguistic concepts familiar to the average linguist. We illustrate this mismatch with a particular linguistic construct, suppletive allomorphy, and discuss ways to encode suppletive allomorphy in the Stuttgart Finite State tools (sfst). The complexity of the general solution motivates our work in providing an alternative formalism for morphology and phonology, one which can be translated automatically into sfst or other morphological parsing engines.

Research paper thumbnail of Book Reviews: A Grammar Writer's Cookbook

ABSTRACT Introduction Grammar writers sometimes approach grammar writing as if the language being... more ABSTRACT Introduction Grammar writers sometimes approach grammar writing as if the language being described were the only language in the world. In contrast, this book reports on the parallel development of computational grammars for three languages: English, French, and German. At the time the book was written, the "ParGram" (Parallel Grammars) project included researchers from the Xerox Palo Alto Research Center (California), the Xerox Research Centre Europe (Grenoble), and the Institut ftir Maschinelle Sprachverarbeitung (University of Stuttgart). The theoretical approach is Lexical-Functional Grammar (LFG), a theory well suited to parallel development, in that it assumes two levels of grammatical representation: "c(onstituent)-structure" is the traditional phrase-structure analysis, while "f(unctional)-structure" is a representation of argument structure (a surfacy kind of semantic representation). While the c-structures of analogous sentences in two languages may differ substantially,

Research paper thumbnail of Parsing Using Linearly Ordered Phonological Rules

A generate and test algorithm is described which parses a surface form into one or more lexical e... more A generate and test algorithm is described which parses a surface form into one or more lexical entries using linearly ordered phonological rules. This algorithm avoids the exponential expansion of search space which a naive parsing algorithm would face by encoding into the form being parsed the ambiguities which arise during parsing. The algorithm has been implemented and tested on real language data, and its speed compares favorably with that of a KIMMO-type parser. * I have benefited from comments on previous versions of this paper by Alan Buseman, and several anonymous referees. Errors remain my own.

Research paper thumbnail of A System for Archivable Grammar Documentation

Research paper thumbnail of Joint Grammar Development by Linguists and Computer Scientists

For languages with inflectional morphology, development of a morphological parser can be a bottle... more For languages with inflectional morphology, development of a morphological parser can be a bottleneck to further development. We focus on two difficulties: first, finding people with expertise in both computer programming and the linguistics of a particular language, and second, the short lifetime of software such as parsers. We describe a methodology to split parser building into two tasks: descriptive grammar development, and formal grammar development. The two grammars are combined into a single document using Literate Programming. The formal grammar is designed to be independent of a particular parsing engine's programming language, so that it can be readily ported to a new parsing engine, thus helping solve the software lifetime problem.

Research paper thumbnail of Electronic Grammars and Reproducible Research

Research paper thumbnail of Frontiers in linguistic annotation for lower-density languages

Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 - LAC '06, 2006

The languages that are most commonly subject to linguistic annotation on a large scale tend to be... more The languages that are most commonly subject to linguistic annotation on a large scale tend to be those with the largest populations or with recent histories of linguistic scholarship. In this paper we discuss the problems associated with lowerdensity languages in the context of the development of linguistically annotated resources. We frame our work with three key questions regarding the definition of lower-density languages; increasing available resources and reducing data requirements. A number of steps forward are identified for increasing the number lowerdensity language corpora with linguistic annotations.

Research paper thumbnail of Spoken Language Characterization

Springer Handbook of Speech Processing, 2008

This chapter describes the types of information that can be used to characterize spoken languages... more This chapter describes the types of information that can be used to characterize spoken languages. Auto-matic spoken language identification (LID) systems, which are tasked with determining the identity of the language of speech samples, can utilize a variety of ...

Research paper thumbnail of Review of A grammar writer's cookbook by Miriam Butt, Tracy Holloway King, María-Eugenia Niño, and Frédérique Segond. Cambridge University Press 1999

Computational Linguistics, 2000

Grammar writers sometimes approach grammar writing as if the language being described were the on... more Grammar writers sometimes approach grammar writing as if the language being described were the only language in the world. In contrast, this book reports on the parallel development of computational grammars for three languages: English, French, and ...

Research paper thumbnail of Phonological analysis and opaque rule orders

Proceedings of the Second International Workshop on …, 1991

... Michael Maxwell. ... Phonological rules are written in the "standard" way with dist... more ... Michael Maxwell. ... Phonological rules are written in the "standard" way with distinctive features but without any abbreviatory conventions (parentheses, curly braces, angled brackets, alpha variables, etc.); the rules apply in linear order, the output of each serving as the input to the ...

Research paper thumbnail of Phonological Knowledge: Conceptual and Empirical Issues (review)

Language, 2002

... This is again an empirical question, as the authors point out in the end. JENNIFER FITZPATRIC... more ... This is again an empirical question, as the authors point out in the end. JENNIFER FITZPATRICK and LINDA R. WHEELDON propose a model of speech perception, claim-ing that (contrastive) underspecification makes for more 'robust' recognition. ...

Research paper thumbnail of Morphological Analysis in Comparison (review)

Language, 2003

... These inverse forms are homophonous with antipassives, which Spencer takes to be a syncretism... more ... These inverse forms are homophonous with antipassives, which Spencer takes to be a syncretism, supporting the need for rules of referral. ... Vol. 2: His-tory, theory, and policy. Pp. 414. ISBN 0805840540. $28.95. Ed. by ROSEANN DUEN˜ AS GONZA´ LEZ, with ILDIKO´ MELIS. ...

Research paper thumbnail of Trubetzkoy's Orphan: Proceedings of the Montreal Roundtable 'Morphonology: Contemporary Responses