Tracy Holloway King | Adobe Systems (original) (raw)

Papers by Tracy Holloway King

Research paper thumbnail of eCom'22: The SIGIR 2022 Workshop on eCommerce

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature... more eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature and is an essential component of some of the world's largest web sites (e.g. Airbnb, Alibaba, Amazon, eBay, Facebook, Flipkart, Lowe's, Taobao, and Target). SIGIR has seen sponsorship from eCommerce organisations for the past several years, reflecting the importance of IR research to them. The purpose of this workshop is (1) to bring together researchers and practitioners of eCommerce IR to discuss topics unique to it, (2) to determine how to use eCommerce's unique combination of free text, structured data, and customer behavioral data to improve search relevance, and (3) to examine how to build datasets and evaluate algorithms in this domain. Since eCommerce customers often do not know exactly what they want to buy (i.e. navigational and spearfishing queries are rare), recommendations are valuable for inspiration and serendipitous discovery as well as basket building. The theme of this year's eCommerce IR workshop is Bridging IR Metrics and Business Metrics and Multi-objective Optimization. The workshop includes papers on this topic as well as a panel focused on this area (see Section 3). In addition, Farfetch is sponsoring a recommendation challenge focused on outfit completion: as part of the event, Farfetch will release to the research community a novel, large dataset containing multi-modal information and extensive labels curated by fashion experts. The data challenge reflects themes from prior SIGIR workshops

Research paper thumbnail of Challenges and research opportunities in eCommerce search and recommendations

ACM SIGIR Forum, 2020

With the rapid adoption of online shopping, academic research in the eCommerce domain has gained ... more With the rapid adoption of online shopping, academic research in the eCommerce domain has gained traction. However, significant research challenges remain, spanning from classic eCommerce search problems such as matching textual queries to multi-modal documents and ranking optimization for two-sided marketplaces to human-computer interaction and recommender systems for discovery and browsing. These research areas are important for understanding customer behavior, driving engagement, and improving product discoverability and conversion. In this article we identify the challenges and highlight research opportunities to improve the eCommerce customer experience.

Research paper thumbnail of The Proceedings of the LFG'13 Conference

An LFG treatment is proposed for 'Displaced Dependents' in English, including dependents of Degre... more An LFG treatment is proposed for 'Displaced Dependents' in English, including dependents of Degree Words, such as 'so', 'too', etc. and certain adjectives (e.g. 'difficult'), as in examples like 'too complex for anyone to understand', 'a difficult problem for anyone to understand' where 'for anyone to understand' is dependent on 'too' and 'difficult' respectively, but is not adjacent to them. The specific topic of this paper is the analysis of 'Displaced Dependents' (DDs) in English, exemplified in (1); the term 'Displaced Dependents' is from Kay and Sag (2012), 1 who provide an impressive constructional analysis in the framework of Sign-Based Construction Grammar (SBCG). We attempt to provide an LFG analysis with comparable empirical coverage and theoretical appeal, without significant extension to the theoretical apparatus of the framework. A secondary, but broader, issue is comparison of the technical apparatus of LFG with Head-Driven Phrase Structure Grammar (HPSG) and SBCG: specifically, the discussion will bring out differences in the apparatus for lexical selection, and in the methods for controlling interactions among long-distance dependencies. The DD construction occurs with degree words (too, so, as, enough, more, and synthetic comparatives); some degree denoting adverbs (e.g. sufficiently, as in (1f)); and some adjectives (e.g. difficult, impossible, fun), e.g. in (1g): (1) a. This problem is too complex ✿✿✿

Research paper thumbnail of ParGramBank: The ParGram Parallel Treebank

This paper discusses the construction of a parallel treebank currently involving ten languages fr... more This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (LexicalFunctional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena. The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.

Research paper thumbnail of Writing Large-Scale Parallel Grammars For English, French, And German

This paper discusses issues relevant to writing large-scale parallel grammars. 1 It is a direct r... more This paper discusses issues relevant to writing large-scale parallel grammars. 1 It is a direct result of our experiences with ParGram, a parallel grammar project involving Xerox PARC (English), XRCE (French), IMS Stuttgart (German), and University of Bergen (Norwegian). The basic goal of the ParGram project is to write large-scale LFG grammars with parallel analyses. In this introduction, we de ne what we mean by parallel analyses and by large scale, and brie y discuss the system which we use. There are three basic aspects to parallel grammars: Similar analyses for similar phenomena Same basic coverage Common features, values, node names, etc. Section 1.1 discusses the rst of these, namely what it means to have parallel analyses. The second issue is covered in section 1.2. The third point, that the grammars have common features, values, and node names, is not

Research paper thumbnail of XLE: XLE: Grammar Development Platform Grammar Development Platform Parser/Generator/Rewrite System Parser/Generator/Rewrite System

Research paper thumbnail of Productive encoding of Urdu complex predicates in the ParGram Project

Complex Predicates are a crosslinguistically general phenomenon, but are more pervasive in South ... more Complex Predicates are a crosslinguistically general phenomenon, but are more pervasive in South Asian than in European languages. This paper describes an LFG solution for Urdu/Hindi complex predication in terms of a RESTRICTION OPERATOR. The solution is theoretically well motivated and can be extended straightforwardly to related phenomena in European languages such as German, Norwegian, and French. 1 The ParGram Project In this paper, we report on the implementation of complex predicates (CP) for Urdu in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002). The ParGram project originally focused on three European languages: English, French, and German. Three other languages were added later: Japanese, Norwegian, and Urdu. The ParGram project uses the XLE parser and grammar development platform (Maxwell and Kaplan, 1993) to develop deep grammars, i.e., grammars which provide an in-depth analysis of a given sentence (as opposed to shallow parsing or chunk pa...

Research paper thumbnail of A Note-taking Appliance for Intelligence Analysts

Note-taking is a very simple and quite common activity of intelligence analysts, especially all-s... more Note-taking is a very simple and quite common activity of intelligence analysts, especially all-source analysts. Common as this activity is, there is little or no technology specifically aimed at making it more effective and efficient: it is mostly carried out by cumbersome copy-paste interactions with standard applications (such as Internet Explorer and Microsoft Word). This paper describes how sophisticated natural language processing technologies, user-interest specifications , and human-interface design have been integrated to produce a lightweight, fail-soft appliance aimed at reducing the cognitive load of note-taking. This appliance condenses user-selected source passages and adds them to a note-file. The condensations are grammatical, preserve relations of interest to the user, and avoid distortions of meaning.

Research paper thumbnail of Porting Grammars between Typologically Similar Languages : Japanese to Korean

The Parallel Grammar project (ParGram) is an international collaboration aimed at producing broad... more The Parallel Grammar project (ParGram) is an international collaboration aimed at producing broad-coverage computational grammars for a variety of languages (Butt et al., 1999; Butt et al., 2002). The grammars (currently of English, French, German, Japanese, Norwegian, and Urdu) are written in the framework of Lexical Functional Grammar (LFG) (Kaplan and Bresnan, 1982; Dalrymple, 2001), and they are constructed using a common engineering and high-speed processing platform for LFG grammars, the XLE (Maxwell and Kaplan, 1993). These grammars, as do all LFG grammars, assign two levels of syntactic representation to the sentences of a language: a superficial phrase structure tree (called a constituent structure or c-structure) and an underlying matrix of features and values (the functional structure or fstructure). The c-structure records the order of words in a sentence and their hierarchical grouping into phrases. The f-structure encodes the grammatical functions, syntactic features, ...

Research paper thumbnail of Optimality Theory Style Constraint Rankingin Large-scale LFG

Research paper thumbnail of Urdu and the Parallel Grammar project

We report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al.,... more We report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002). 1 The ParGram project was designed to use a single grammar development platform and a unified methodology of grammar writing to develop large-scale grammars for typologically different languages. At the beginning of the project, three typologically similar European grammars were implemented. The addition of two Asian languages, Urdu and Japanese, has shown that the basic analysis decisions made for the European languages can be applied to typologically distinct languages. However, the Asian languages required the addition of a small number of new standard analyses to cover constructions and analysis techniques not found in the European languages. With these additional standards, the ParGram project can now be applied to other typologically distinct languages.

Research paper thumbnail of Intelligent Linguistic Architectures: Variations on Themes by Ronald M. Kaplan

ABSTRACT Ronald M. Kaplan has made foundational contributions to the development of computational... more ABSTRACT Ronald M. Kaplan has made foundational contributions to the development of computational linguistic research and linguistic theory, particularly within Lexical-Functional Grammar. Intelligent Linguistic Architectures, a tribute to Kaplan’s cutting-edge work, collects computational and theoretical linguistics papers in his research areas. From machine translation to grammar engineering, from formal issues to semantic theory, this ambitious volume represents the newest developments in linguistic scholarship.

Research paper thumbnail of Linguistic Extraction of Temporal and Location Information for a Recommender System

Research paper thumbnail of ECOM'21: The SIGIR 2021 Workshop on eCommerce

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Research paper thumbnail of On the Nature of the Syntax-Phonology Interface

Research paper thumbnail of (Xx*-)Linguistics: Because We Love Language

Linguistic Issues in Language Technology

Overall, although there is not the intense overlap between theoretical and computational linguist... more Overall, although there is not the intense overlap between theoretical and computational linguistics that many seem to wish, I do not see this as a problem from a practical perspective. Everyone is working towards a common goal to understand language and to unearth its structure. By approaching the problem from different angles and with different immediate goals, we gain a broader perspective on the problem, and language is complex enough to need this broad perspective. As with all fields and as with the (Xx*-)linguistic fields, there are some researchers who more naturally reach outside their immediate domain and who will see where there are convergences and contradictions. Forcing everyone into this role would slow the growth of the field as a whole given the vast amount of work still before us.

Research paper thumbnail of Theoretical perspectives on word order in South Asian languages

Copyright© 1994 Center for the Study of Language and Information Leland Stanford Junior Universit... more Copyright© 1994 Center for the Study of Language and Information Leland Stanford Junior University Printed in the United States 99 98 97 96 95 94 5 4 3 2 1 Library of Congress Cataloging-in-Publication Data Theoretical perspectives on word order in South Asian languages/ edited by Miriam ...

Research paper thumbnail of Exploiting f-structure input for sentence condensation

In this paper, we describe the types of sentence condensation rules used in the sentence condensa... more In this paper, we describe the types of sentence condensation rules used in the sentence condensation of Riezler et al. 2003 in detail. We show how the distinctions made in LFG f-structures as to grammatical functions and features make it possible to state simple but accurate rules to create smaller, well-formed f-structures from which the condensed sentence can be generated.

Research paper thumbnail of Method and system for grammatical text condensation

Abstract: Techniques are provided for determining grammatical condensed text structures. Packed s... more Abstract: Techniques are provided for determining grammatical condensed text structures. Packed structures are determined for text structures based on a parsing grammar. Reduced packed structures are determined by applying transformation rules to packed and/or unpacked elements of the packed structures. A disambiguation model is applied to the reduced packed structure to determine candidate structures. A grammatically correct generation grammar is applied to the candidate structures to determine grammatical ...

Research paper thumbnail of Overlay Mechanisms for Multi-level Deep Processing Applications

Deep grammars that include tokenization, morphology, syntax, and semantic layers have obtained br... more Deep grammars that include tokenization, morphology, syntax, and semantic layers have obtained broad coverage in conjunction with high efficiency. This allows them to play a crucial role in applications. However, these grammars are often developed as a general purpose grammar, expecting "standard" input, and have to be specialized for the application domain. This paper discusses some engineering tools that are used in the XLE grammar development platform to allow for domain specialization. It provides examples of techniques used to allow specialization via overlay grammars at the level of tokenization, morphology, syntax, the lexicon, and semantics. As an example, the paper focuses on the use of the broad coverage, general purpose ParGram English grammar and semantics in the context of an Intelligent Document Security Solutions (IDSS) system. Within this system, the grammar is used to automatically identify sensitive entities and relations among entities, which can then be redacted to protect the content.

Research paper thumbnail of eCom'22: The SIGIR 2022 Workshop on eCommerce

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature... more eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature and is an essential component of some of the world's largest web sites (e.g. Airbnb, Alibaba, Amazon, eBay, Facebook, Flipkart, Lowe's, Taobao, and Target). SIGIR has seen sponsorship from eCommerce organisations for the past several years, reflecting the importance of IR research to them. The purpose of this workshop is (1) to bring together researchers and practitioners of eCommerce IR to discuss topics unique to it, (2) to determine how to use eCommerce's unique combination of free text, structured data, and customer behavioral data to improve search relevance, and (3) to examine how to build datasets and evaluate algorithms in this domain. Since eCommerce customers often do not know exactly what they want to buy (i.e. navigational and spearfishing queries are rare), recommendations are valuable for inspiration and serendipitous discovery as well as basket building. The theme of this year's eCommerce IR workshop is Bridging IR Metrics and Business Metrics and Multi-objective Optimization. The workshop includes papers on this topic as well as a panel focused on this area (see Section 3). In addition, Farfetch is sponsoring a recommendation challenge focused on outfit completion: as part of the event, Farfetch will release to the research community a novel, large dataset containing multi-modal information and extensive labels curated by fashion experts. The data challenge reflects themes from prior SIGIR workshops

Research paper thumbnail of Challenges and research opportunities in eCommerce search and recommendations

ACM SIGIR Forum, 2020

With the rapid adoption of online shopping, academic research in the eCommerce domain has gained ... more With the rapid adoption of online shopping, academic research in the eCommerce domain has gained traction. However, significant research challenges remain, spanning from classic eCommerce search problems such as matching textual queries to multi-modal documents and ranking optimization for two-sided marketplaces to human-computer interaction and recommender systems for discovery and browsing. These research areas are important for understanding customer behavior, driving engagement, and improving product discoverability and conversion. In this article we identify the challenges and highlight research opportunities to improve the eCommerce customer experience.

Research paper thumbnail of The Proceedings of the LFG'13 Conference

An LFG treatment is proposed for 'Displaced Dependents' in English, including dependents of Degre... more An LFG treatment is proposed for 'Displaced Dependents' in English, including dependents of Degree Words, such as 'so', 'too', etc. and certain adjectives (e.g. 'difficult'), as in examples like 'too complex for anyone to understand', 'a difficult problem for anyone to understand' where 'for anyone to understand' is dependent on 'too' and 'difficult' respectively, but is not adjacent to them. The specific topic of this paper is the analysis of 'Displaced Dependents' (DDs) in English, exemplified in (1); the term 'Displaced Dependents' is from Kay and Sag (2012), 1 who provide an impressive constructional analysis in the framework of Sign-Based Construction Grammar (SBCG). We attempt to provide an LFG analysis with comparable empirical coverage and theoretical appeal, without significant extension to the theoretical apparatus of the framework. A secondary, but broader, issue is comparison of the technical apparatus of LFG with Head-Driven Phrase Structure Grammar (HPSG) and SBCG: specifically, the discussion will bring out differences in the apparatus for lexical selection, and in the methods for controlling interactions among long-distance dependencies. The DD construction occurs with degree words (too, so, as, enough, more, and synthetic comparatives); some degree denoting adverbs (e.g. sufficiently, as in (1f)); and some adjectives (e.g. difficult, impossible, fun), e.g. in (1g): (1) a. This problem is too complex ✿✿✿

Research paper thumbnail of ParGramBank: The ParGram Parallel Treebank

This paper discusses the construction of a parallel treebank currently involving ten languages fr... more This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (LexicalFunctional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena. The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.

Research paper thumbnail of Writing Large-Scale Parallel Grammars For English, French, And German

This paper discusses issues relevant to writing large-scale parallel grammars. 1 It is a direct r... more This paper discusses issues relevant to writing large-scale parallel grammars. 1 It is a direct result of our experiences with ParGram, a parallel grammar project involving Xerox PARC (English), XRCE (French), IMS Stuttgart (German), and University of Bergen (Norwegian). The basic goal of the ParGram project is to write large-scale LFG grammars with parallel analyses. In this introduction, we de ne what we mean by parallel analyses and by large scale, and brie y discuss the system which we use. There are three basic aspects to parallel grammars: Similar analyses for similar phenomena Same basic coverage Common features, values, node names, etc. Section 1.1 discusses the rst of these, namely what it means to have parallel analyses. The second issue is covered in section 1.2. The third point, that the grammars have common features, values, and node names, is not

Research paper thumbnail of XLE: XLE: Grammar Development Platform Grammar Development Platform Parser/Generator/Rewrite System Parser/Generator/Rewrite System

Research paper thumbnail of Productive encoding of Urdu complex predicates in the ParGram Project

Complex Predicates are a crosslinguistically general phenomenon, but are more pervasive in South ... more Complex Predicates are a crosslinguistically general phenomenon, but are more pervasive in South Asian than in European languages. This paper describes an LFG solution for Urdu/Hindi complex predication in terms of a RESTRICTION OPERATOR. The solution is theoretically well motivated and can be extended straightforwardly to related phenomena in European languages such as German, Norwegian, and French. 1 The ParGram Project In this paper, we report on the implementation of complex predicates (CP) for Urdu in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002). The ParGram project originally focused on three European languages: English, French, and German. Three other languages were added later: Japanese, Norwegian, and Urdu. The ParGram project uses the XLE parser and grammar development platform (Maxwell and Kaplan, 1993) to develop deep grammars, i.e., grammars which provide an in-depth analysis of a given sentence (as opposed to shallow parsing or chunk pa...

Research paper thumbnail of A Note-taking Appliance for Intelligence Analysts

Note-taking is a very simple and quite common activity of intelligence analysts, especially all-s... more Note-taking is a very simple and quite common activity of intelligence analysts, especially all-source analysts. Common as this activity is, there is little or no technology specifically aimed at making it more effective and efficient: it is mostly carried out by cumbersome copy-paste interactions with standard applications (such as Internet Explorer and Microsoft Word). This paper describes how sophisticated natural language processing technologies, user-interest specifications , and human-interface design have been integrated to produce a lightweight, fail-soft appliance aimed at reducing the cognitive load of note-taking. This appliance condenses user-selected source passages and adds them to a note-file. The condensations are grammatical, preserve relations of interest to the user, and avoid distortions of meaning.

Research paper thumbnail of Porting Grammars between Typologically Similar Languages : Japanese to Korean

The Parallel Grammar project (ParGram) is an international collaboration aimed at producing broad... more The Parallel Grammar project (ParGram) is an international collaboration aimed at producing broad-coverage computational grammars for a variety of languages (Butt et al., 1999; Butt et al., 2002). The grammars (currently of English, French, German, Japanese, Norwegian, and Urdu) are written in the framework of Lexical Functional Grammar (LFG) (Kaplan and Bresnan, 1982; Dalrymple, 2001), and they are constructed using a common engineering and high-speed processing platform for LFG grammars, the XLE (Maxwell and Kaplan, 1993). These grammars, as do all LFG grammars, assign two levels of syntactic representation to the sentences of a language: a superficial phrase structure tree (called a constituent structure or c-structure) and an underlying matrix of features and values (the functional structure or fstructure). The c-structure records the order of words in a sentence and their hierarchical grouping into phrases. The f-structure encodes the grammatical functions, syntactic features, ...

Research paper thumbnail of Optimality Theory Style Constraint Rankingin Large-scale LFG

Research paper thumbnail of Urdu and the Parallel Grammar project

We report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al.,... more We report on the role of the Urdu grammar in the Parallel Grammar (ParGram) project (Butt et al., 1999; Butt et al., 2002). 1 The ParGram project was designed to use a single grammar development platform and a unified methodology of grammar writing to develop large-scale grammars for typologically different languages. At the beginning of the project, three typologically similar European grammars were implemented. The addition of two Asian languages, Urdu and Japanese, has shown that the basic analysis decisions made for the European languages can be applied to typologically distinct languages. However, the Asian languages required the addition of a small number of new standard analyses to cover constructions and analysis techniques not found in the European languages. With these additional standards, the ParGram project can now be applied to other typologically distinct languages.

Research paper thumbnail of Intelligent Linguistic Architectures: Variations on Themes by Ronald M. Kaplan

ABSTRACT Ronald M. Kaplan has made foundational contributions to the development of computational... more ABSTRACT Ronald M. Kaplan has made foundational contributions to the development of computational linguistic research and linguistic theory, particularly within Lexical-Functional Grammar. Intelligent Linguistic Architectures, a tribute to Kaplan’s cutting-edge work, collects computational and theoretical linguistics papers in his research areas. From machine translation to grammar engineering, from formal issues to semantic theory, this ambitious volume represents the newest developments in linguistic scholarship.

Research paper thumbnail of Linguistic Extraction of Temporal and Location Information for a Recommender System

Research paper thumbnail of ECOM'21: The SIGIR 2021 Workshop on eCommerce

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Research paper thumbnail of On the Nature of the Syntax-Phonology Interface

Research paper thumbnail of (Xx*-)Linguistics: Because We Love Language

Linguistic Issues in Language Technology

Overall, although there is not the intense overlap between theoretical and computational linguist... more Overall, although there is not the intense overlap between theoretical and computational linguistics that many seem to wish, I do not see this as a problem from a practical perspective. Everyone is working towards a common goal to understand language and to unearth its structure. By approaching the problem from different angles and with different immediate goals, we gain a broader perspective on the problem, and language is complex enough to need this broad perspective. As with all fields and as with the (Xx*-)linguistic fields, there are some researchers who more naturally reach outside their immediate domain and who will see where there are convergences and contradictions. Forcing everyone into this role would slow the growth of the field as a whole given the vast amount of work still before us.

Research paper thumbnail of Theoretical perspectives on word order in South Asian languages

Copyright© 1994 Center for the Study of Language and Information Leland Stanford Junior Universit... more Copyright© 1994 Center for the Study of Language and Information Leland Stanford Junior University Printed in the United States 99 98 97 96 95 94 5 4 3 2 1 Library of Congress Cataloging-in-Publication Data Theoretical perspectives on word order in South Asian languages/ edited by Miriam ...

Research paper thumbnail of Exploiting f-structure input for sentence condensation

In this paper, we describe the types of sentence condensation rules used in the sentence condensa... more In this paper, we describe the types of sentence condensation rules used in the sentence condensation of Riezler et al. 2003 in detail. We show how the distinctions made in LFG f-structures as to grammatical functions and features make it possible to state simple but accurate rules to create smaller, well-formed f-structures from which the condensed sentence can be generated.

Research paper thumbnail of Method and system for grammatical text condensation

Abstract: Techniques are provided for determining grammatical condensed text structures. Packed s... more Abstract: Techniques are provided for determining grammatical condensed text structures. Packed structures are determined for text structures based on a parsing grammar. Reduced packed structures are determined by applying transformation rules to packed and/or unpacked elements of the packed structures. A disambiguation model is applied to the reduced packed structure to determine candidate structures. A grammatically correct generation grammar is applied to the candidate structures to determine grammatical ...

Research paper thumbnail of Overlay Mechanisms for Multi-level Deep Processing Applications

Deep grammars that include tokenization, morphology, syntax, and semantic layers have obtained br... more Deep grammars that include tokenization, morphology, syntax, and semantic layers have obtained broad coverage in conjunction with high efficiency. This allows them to play a crucial role in applications. However, these grammars are often developed as a general purpose grammar, expecting "standard" input, and have to be specialized for the application domain. This paper discusses some engineering tools that are used in the XLE grammar development platform to allow for domain specialization. It provides examples of techniques used to allow specialization via overlay grammars at the level of tokenization, morphology, syntax, the lexicon, and semantics. As an example, the paper focuses on the use of the broad coverage, general purpose ParGram English grammar and semantics in the context of an Intelligent Document Security Solutions (IDSS) system. Within this system, the grammar is used to automatically identify sensitive entities and relations among entities, which can then be redacted to protect the content.