Ewan Klein | University of Edinburgh (original) (raw)
Papers by Ewan Klein
ACORD Deliverable T2.1, Feb 9, 1986
ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Gramm... more ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Grammar with the semantic insights of Discourse Representation Theory. The addition of unification to these two frameworks allows a simple account of interaction between different linguistic levels within a constraining, monostraial theory. The resulting, computationally efficient, system provides an explicit formal framework for linguistic description, within which large fragments of grammars for French and English have already been developed. We present the formal basis of UCG, with independent definitions of well-formedness for syntactic and semantic dimensions. We will also focus on the concept of modifier within the theory.
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2014
Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchanne... more Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchannel continuers such as mm-hmm in naturalistic locations during the user's utterances. Using the HCRC Map Task Corpus as our data source, we describe models for predicting these locations using only limited processing and features of the user's speech that are commonly available, and which therefore could be used as a low-cost improvement for current systems.
Abstract This paper examines certain aspects of phonological structure from the viewpoint of abst... more Abstract This paper examines certain aspects of phonological structure from the viewpoint of abstract data types. Our immediate goal is to find a format for phonological representation which will be reasonably faithful to the concerns of theoretical phonology while being rigorous enough to admit a computational interpretation. The longer term goal is to incorporate such representations into an appropriate general framework for natural language processing.
How is it that people manage to communicate even when they implicitly differ on the meaning of th... more How is it that people manage to communicate even when they implicitly differ on the meaning of the terms they use? Take an innocent-sounding expression such as tomorrow morning. What counts as morning? There is a surprising amount of variation across different people.
Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique fo... more Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique for knowledge extraction from natural language text, in particular looking at the task of story understanding. This technique is extended with a reasoning engine that borrows techniques from dynamic ontology refinement to discover the semantic similarity of stories and to merge them together.
This book offers a highly accessible introduction to Natural Language Processing, the field that ... more This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to automatic summarization and translation. With Natural Language Processing with Python, you'll learn how to write Python programs to work with large collections of unstructured text. You'll access richly-annotated datasets using a comprehensive range of linguistic data structures.
Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialog... more Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialogue systems. Implemented on top of the Open Agent Architecture (OAA), it comprises agents for speech input and output, dialogue management, and further supporting agents. We define a formal syntax and semantics for the DIPPER information state update language. The language is independent of particular programming languages, and incorporates procedural attachments for access to external resources using OAA.
ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven... more ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven Phrase Structure Grammar (HPSG) has focussed on syntax and semantics, largely to the exclusion of phonology. In return, current developments in phonology have generally ignored the technical and linguistic innovations offered by constraint-based grammar formalisms.
Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architect... more Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architecture which will support large-scale sharing of data and computing resources. Most attention so far has focused either on building infrastructure or on applications in disciplines such as physics, biology, medicine and astronomy. Over the last five years or so, modern Natural Language Processing has evolved into a large-scale endeavour.
Abstract To verify hardware designs by model checking, circuit specifications are commonly expres... more Abstract To verify hardware designs by model checking, circuit specifications are commonly expressed in the temporal logic CTL. Automatic conversion of English to CTL requires the definition of an appropriately restricted subset of English. We show how the limited semantic expressibility of CTL can be exploited to derive a hierarchy of subsets. Our strategy potentially avoids difficulties with approaches that take existing computational semantic analyses of English as their starting point.
Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based ... more Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based on an HPSG framework, with an implementation in ALE. Prominence and juncture are represented by n-ary branching metrical trees. The general aim is to define prosodic structures recursively, in parallel with the definition of syntactic structures. We address a number of prima facie problems arising from the discrepancy between syntactic and prosodic structure
Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes ... more Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes research into the fine-grain of human-human interaction could offer potentially valuable insights, and in this article he explains why. He also raises questions about trust and accountability in dealing with 'invisible'artefacts.
Abstract In this paper we focus on the software for computational semantics provided by the Pytho... more Abstract In this paper we focus on the software for computational semantics provided by the Python-based Natural Language Toolkit (nltk). The semantics modules in nltk are inspired in large part by the approach developed in Blackburn and Bos (2005)(henceforth referred to as B&B). Since Blackburn and Bos have also provided a software suite to accompany their excellent textbook, one might ask what the justification is for the nltk offering, which is similarly slanted towards teaching computational semantics.
Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus o... more Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus our attention on the non-entailing Text and Hypothesis pairs in the dataset. The system uses a Maximum Entropy framework to classify each pair of Text and Hypothesis as either yes or no, using a range of different feature sets based on an analysis of the existing non-entailing pairs in rte training data.
Abstract Large databases of annotated text and speech are widely used for developing and testing ... more Abstract Large databases of annotated text and speech are widely used for developing and testing language technologies. However, the size of these corpora and associated language models are outpacing the growth of processing power and network bandwidth available to most researchers.
Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language process... more Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language processing to students majoring in linguistics or computer science. This paper describes the design of NLTK, and reports on how it has been used effectively in classes that involve different mixes of linguistics and computer science students. We focus on three key issues: getting started with a course, delivering interactive demonstrations in the classroom, and organizing assignments and projects.
ACORD Deliverable T2.1, Feb 9, 1986
ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Gramm... more ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Grammar with the semantic insights of Discourse Representation Theory. The addition of unification to these two frameworks allows a simple account of interaction between different linguistic levels within a constraining, monostraial theory. The resulting, computationally efficient, system provides an explicit formal framework for linguistic description, within which large fragments of grammars for French and English have already been developed. We present the formal basis of UCG, with independent definitions of well-formedness for syntactic and semantic dimensions. We will also focus on the concept of modifier within the theory.
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2014
Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchanne... more Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchannel continuers such as mm-hmm in naturalistic locations during the user's utterances. Using the HCRC Map Task Corpus as our data source, we describe models for predicting these locations using only limited processing and features of the user's speech that are commonly available, and which therefore could be used as a low-cost improvement for current systems.
Abstract This paper examines certain aspects of phonological structure from the viewpoint of abst... more Abstract This paper examines certain aspects of phonological structure from the viewpoint of abstract data types. Our immediate goal is to find a format for phonological representation which will be reasonably faithful to the concerns of theoretical phonology while being rigorous enough to admit a computational interpretation. The longer term goal is to incorporate such representations into an appropriate general framework for natural language processing.
How is it that people manage to communicate even when they implicitly differ on the meaning of th... more How is it that people manage to communicate even when they implicitly differ on the meaning of the terms they use? Take an innocent-sounding expression such as tomorrow morning. What counts as morning? There is a surprising amount of variation across different people.
Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique fo... more Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique for knowledge extraction from natural language text, in particular looking at the task of story understanding. This technique is extended with a reasoning engine that borrows techniques from dynamic ontology refinement to discover the semantic similarity of stories and to merge them together.
This book offers a highly accessible introduction to Natural Language Processing, the field that ... more This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to automatic summarization and translation. With Natural Language Processing with Python, you'll learn how to write Python programs to work with large collections of unstructured text. You'll access richly-annotated datasets using a comprehensive range of linguistic data structures.
Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialog... more Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialogue systems. Implemented on top of the Open Agent Architecture (OAA), it comprises agents for speech input and output, dialogue management, and further supporting agents. We define a formal syntax and semantics for the DIPPER information state update language. The language is independent of particular programming languages, and incorporates procedural attachments for access to external resources using OAA.
ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven... more ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven Phrase Structure Grammar (HPSG) has focussed on syntax and semantics, largely to the exclusion of phonology. In return, current developments in phonology have generally ignored the technical and linguistic innovations offered by constraint-based grammar formalisms.
Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architect... more Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architecture which will support large-scale sharing of data and computing resources. Most attention so far has focused either on building infrastructure or on applications in disciplines such as physics, biology, medicine and astronomy. Over the last five years or so, modern Natural Language Processing has evolved into a large-scale endeavour.
Abstract To verify hardware designs by model checking, circuit specifications are commonly expres... more Abstract To verify hardware designs by model checking, circuit specifications are commonly expressed in the temporal logic CTL. Automatic conversion of English to CTL requires the definition of an appropriately restricted subset of English. We show how the limited semantic expressibility of CTL can be exploited to derive a hierarchy of subsets. Our strategy potentially avoids difficulties with approaches that take existing computational semantic analyses of English as their starting point.
Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based ... more Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based on an HPSG framework, with an implementation in ALE. Prominence and juncture are represented by n-ary branching metrical trees. The general aim is to define prosodic structures recursively, in parallel with the definition of syntactic structures. We address a number of prima facie problems arising from the discrepancy between syntactic and prosodic structure
Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes ... more Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes research into the fine-grain of human-human interaction could offer potentially valuable insights, and in this article he explains why. He also raises questions about trust and accountability in dealing with 'invisible'artefacts.
Abstract In this paper we focus on the software for computational semantics provided by the Pytho... more Abstract In this paper we focus on the software for computational semantics provided by the Python-based Natural Language Toolkit (nltk). The semantics modules in nltk are inspired in large part by the approach developed in Blackburn and Bos (2005)(henceforth referred to as B&B). Since Blackburn and Bos have also provided a software suite to accompany their excellent textbook, one might ask what the justification is for the nltk offering, which is similarly slanted towards teaching computational semantics.
Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus o... more Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus our attention on the non-entailing Text and Hypothesis pairs in the dataset. The system uses a Maximum Entropy framework to classify each pair of Text and Hypothesis as either yes or no, using a range of different feature sets based on an analysis of the existing non-entailing pairs in rte training data.
Abstract Large databases of annotated text and speech are widely used for developing and testing ... more Abstract Large databases of annotated text and speech are widely used for developing and testing language technologies. However, the size of these corpora and associated language models are outpacing the growth of processing power and network bandwidth available to most researchers.
Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language process... more Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language processing to students majoring in linguistics or computer science. This paper describes the design of NLTK, and reports on how it has been used effectively in classes that involve different mixes of linguistics and computer science students. We focus on three key issues: getting started with a course, delivering interactive demonstrations in the classroom, and organizing assignments and projects.