Ewan Klein | University of Edinburgh (original) (raw)

Papers by Ewan Klein

Research paper thumbnail of Problems of Dialogue Parsing

ACORD Deliverable T2.1, Feb 9, 1986

Research paper thumbnail of Unification categorial grammar

ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Gramm... more ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Grammar with the semantic insights of Discourse Representation Theory. The addition of unification to these two frameworks allows a simple account of interaction between different linguistic levels within a constraining, monostraial theory. The resulting, computationally efficient, system provides an explicit formal framework for linguistic description, within which large fragments of grammars for French and English have already been developed. We present the formal basis of UCG, with independent definitions of well-formedness for syntactic and semantic dimensions. We will also focus on the concept of modifier within the theory.

Research paper thumbnail of Bootstrapping a historical commodities lexicon with SKOS and DBpedia

Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2014

Research paper thumbnail of A shallow model of backchannel continuers in spoken dialogue

Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchanne... more Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchannel continuers such as mm-hmm in naturalistic locations during the user's utterances. Using the HCRC Map Task Corpus as our data source, we describe models for predicting these locations using only limited processing and features of the user's speech that are commonly available, and which therefore could be used as a low-cost improvement for current systems.

Research paper thumbnail of Data types in computational phonology

Abstract This paper examines certain aspects of phonological structure from the viewpoint of abst... more Abstract This paper examines certain aspects of phonological structure from the viewpoint of abstract data types. Our immediate goal is to find a format for phonological representation which will be reasonably faithful to the concerns of theoretical phonology while being rigorous enough to admit a computational interpretation. The longer term goal is to incorporate such representations into an appropriate general framework for natural language processing.

Research paper thumbnail of Temporal vagueness, coordination and communication

How is it that people manage to communicate even when they implicitly differ on the meaning of th... more How is it that people manage to communicate even when they implicitly differ on the meaning of the terms they use? Take an innocent-sounding expression such as tomorrow morning. What counts as morning? There is a surprising amount of variation across different people.

Research paper thumbnail of Merging stories with shallow semantics

Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique fo... more Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique for knowledge extraction from natural language text, in particular looking at the task of story understanding. This technique is extended with a reasoning engine that borrows techniques from dynamic ontology refinement to discover the semantic similarity of stories and to merge them together.

Research paper thumbnail of Natural language processing with Python

This book offers a highly accessible introduction to Natural Language Processing, the field that ... more This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to automatic summarization and translation. With Natural Language Processing with Python, you'll learn how to write Python programs to work with large collections of unstructured text. You'll access richly-annotated datasets using a comprehensive range of linguistic data structures.

Research paper thumbnail of Shape conditions and phonological context

Research paper thumbnail of Prosodic constituency in HPSG

Research paper thumbnail of DIPPER: Description and formalisation of an information-state update dialogue system architecture

Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialog... more Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialogue systems. Implemented on top of the Open Agent Architecture (OAA), it comprises agents for speech input and output, dialogue management, and further supporting agents. We define a formal syntax and semantics for the DIPPER information state update language. The language is independent of particular programming languages, and incorporates procedural attachments for access to external resources using OAA.

Research paper thumbnail of Enriching HPSG phonology

ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven... more ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven Phrase Structure Grammar (HPSG) has focussed on syntax and semantics, largely to the exclusion of phonology. In return, current developments in phonology have generally ignored the technical and linguistic innovations offered by constraint-based grammar formalisms.

Research paper thumbnail of Version 1.3, March 18, 2003

Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architect... more Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architecture which will support large-scale sharing of data and computing resources. Most attention so far has focused either on building infrastructure or on applications in disciplines such as physics, biology, medicine and astronomy. Over the last five years or so, modern Natural Language Processing has evolved into a large-scale endeavour.

Research paper thumbnail of Description of restricted natural language

Abstract To verify hardware designs by model checking, circuit specifications are commonly expres... more Abstract To verify hardware designs by model checking, circuit specifications are commonly expressed in the temporal logic CTL. Automatic conversion of English to CTL requires the definition of an appropriately restricted subset of English. We show how the limited semantic expressibility of CTL can be exploited to derive a hierarchy of subsets. Our strategy potentially avoids difficulties with approaches that take existing computational semantic analyses of English as their starting point.

Research paper thumbnail of A constraint-based approach to English prosodic constituents

Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based ... more Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based on an HPSG framework, with an implementation in ALE. Prominence and juncture are represented by n-ary branching metrical trees. The general aim is to define prosodic structures recursively, in parallel with the definition of syntactic structures. We address a number of prima facie problems arising from the discrepancy between syntactic and prosodic structure

Research paper thumbnail of Why should a computer be anything like a human being?

Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes ... more Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes research into the fine-grain of human-human interaction could offer potentially valuable insights, and in this article he explains why. He also raises questions about trust and accountability in dealing with 'invisible'artefacts.

Research paper thumbnail of An extensible toolkit for computational semantics

Abstract In this paper we focus on the software for computational semantics provided by the Pytho... more Abstract In this paper we focus on the software for computational semantics provided by the Python-based Natural Language Toolkit (nltk). The semantics modules in nltk are inspired in large part by the approach developed in Blackburn and Bos (2005)(henceforth referred to as B&B). Since Blackburn and Bos have also provided a software suite to accompany their excellent textbook, one might ask what the justification is for the nltk offering, which is similarly slanted towards teaching computational semantics.

Research paper thumbnail of Recognising Textual Entailment Focusing on Non-Entailing Text and Hypothesis

Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus o... more Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus our attention on the non-entailing Text and Hypothesis pairs in the dataset. The system uses a Maximum Entropy framework to classify each pair of Text and Hypothesis as either yes or no, using a range of different feature sets based on an analysis of the existing non-entailing pairs in rte training data.

Research paper thumbnail of Experiments with data-intensive NLP on a computational grid

Abstract Large databases of annotated text and speech are widely used for developing and testing ... more Abstract Large databases of annotated text and speech are widely used for developing and testing language technologies. However, the size of these corpora and associated language models are outpacing the growth of processing power and network bandwidth available to most researchers.

Research paper thumbnail of Multidisciplinary instruction with the natural language toolkit

Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language process... more Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language processing to students majoring in linguistics or computer science. This paper describes the design of NLTK, and reports on how it has been used effectively in classes that involve different mixes of linguistics and computer science students. We focus on three key issues: getting started with a course, delivering interactive demonstrations in the classroom, and organizing assignments and projects.

Research paper thumbnail of Problems of Dialogue Parsing

ACORD Deliverable T2.1, Feb 9, 1986

Research paper thumbnail of Unification categorial grammar

ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Gramm... more ABSTRACT Unification Categorial Grammar (UCG) combines the syntactic insights of Categorial Grammar with the semantic insights of Discourse Representation Theory. The addition of unification to these two frameworks allows a simple account of interaction between different linguistic levels within a constraining, monostraial theory. The resulting, computationally efficient, system provides an explicit formal framework for linguistic description, within which large fragments of grammars for French and English have already been developed. We present the formal basis of UCG, with independent definitions of well-formedness for syntactic and semantic dimensions. We will also focus on the concept of modifier within the theory.

Research paper thumbnail of Bootstrapping a historical commodities lexicon with SKOS and DBpedia

Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), 2014

Research paper thumbnail of A shallow model of backchannel continuers in spoken dialogue

Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchanne... more Abstract Spoken dialogue systems would be more acceptable if they were able to produce backchannel continuers such as mm-hmm in naturalistic locations during the user's utterances. Using the HCRC Map Task Corpus as our data source, we describe models for predicting these locations using only limited processing and features of the user's speech that are commonly available, and which therefore could be used as a low-cost improvement for current systems.

Research paper thumbnail of Data types in computational phonology

Abstract This paper examines certain aspects of phonological structure from the viewpoint of abst... more Abstract This paper examines certain aspects of phonological structure from the viewpoint of abstract data types. Our immediate goal is to find a format for phonological representation which will be reasonably faithful to the concerns of theoretical phonology while being rigorous enough to admit a computational interpretation. The longer term goal is to incorporate such representations into an appropriate general framework for natural language processing.

Research paper thumbnail of Temporal vagueness, coordination and communication

How is it that people manage to communicate even when they implicitly differ on the meaning of th... more How is it that people manage to communicate even when they implicitly differ on the meaning of the terms they use? Take an innocent-sounding expression such as tomorrow morning. What counts as morning? There is a surprising amount of variation across different people.

Research paper thumbnail of Merging stories with shallow semantics

Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique fo... more Abstract We demonstrate a proof-of-concept system that uses a shallow chunking-based technique for knowledge extraction from natural language text, in particular looking at the task of story understanding. This technique is extended with a reasoning engine that borrows techniques from dynamic ontology refinement to discover the semantic similarity of stories and to merge them together.

Research paper thumbnail of Natural language processing with Python

This book offers a highly accessible introduction to Natural Language Processing, the field that ... more This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to automatic summarization and translation. With Natural Language Processing with Python, you'll learn how to write Python programs to work with large collections of unstructured text. You'll access richly-annotated datasets using a comprehensive range of linguistic data structures.

Research paper thumbnail of Shape conditions and phonological context

Research paper thumbnail of Prosodic constituency in HPSG

Research paper thumbnail of DIPPER: Description and formalisation of an information-state update dialogue system architecture

Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialog... more Abstract The DIPPER architecture is a collection of software agents for prototyping spoken dialogue systems. Implemented on top of the Open Agent Architecture (OAA), it comprises agents for speech input and output, dialogue management, and further supporting agents. We define a formal syntax and semantics for the DIPPER information state update language. The language is independent of particular programming languages, and incorporates procedural attachments for access to external resources using OAA.

Research paper thumbnail of Enriching HPSG phonology

ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven... more ABSTRACT Research within the framework of constraint-based grammar formalisms such as Head-driven Phrase Structure Grammar (HPSG) has focussed on syntax and semantics, largely to the exclusion of phonology. In return, current developments in phonology have generally ignored the technical and linguistic innovations offered by constraint-based grammar formalisms.

Research paper thumbnail of Version 1.3, March 18, 2003

Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architect... more Abstract The vision of Grid technology has motivated a huge effort in developing a Grid architecture which will support large-scale sharing of data and computing resources. Most attention so far has focused either on building infrastructure or on applications in disciplines such as physics, biology, medicine and astronomy. Over the last five years or so, modern Natural Language Processing has evolved into a large-scale endeavour.

Research paper thumbnail of Description of restricted natural language

Abstract To verify hardware designs by model checking, circuit specifications are commonly expres... more Abstract To verify hardware designs by model checking, circuit specifications are commonly expressed in the temporal logic CTL. Automatic conversion of English to CTL requires the definition of an appropriately restricted subset of English. We show how the limited semantic expressibility of CTL can be exploited to derive a hierarchy of subsets. Our strategy potentially avoids difficulties with approaches that take existing computational semantic analyses of English as their starting point.

Research paper thumbnail of A constraint-based approach to English prosodic constituents

Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based ... more Abstract The paper develops a constraint-based theory of prosodic phrasing and prominence, based on an HPSG framework, with an implementation in ALE. Prominence and juncture are represented by n-ary branching metrical trees. The general aim is to define prosodic structures recursively, in parallel with the definition of syntactic structures. We address a number of prima facie problems arising from the discrepancy between syntactic and prosodic structure

Research paper thumbnail of Why should a computer be anything like a human being?

Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes ... more Could interaction design learn or benefit from looking at human interaction? Ewan Klein believes research into the fine-grain of human-human interaction could offer potentially valuable insights, and in this article he explains why. He also raises questions about trust and accountability in dealing with 'invisible'artefacts.

Research paper thumbnail of An extensible toolkit for computational semantics

Abstract In this paper we focus on the software for computational semantics provided by the Pytho... more Abstract In this paper we focus on the software for computational semantics provided by the Python-based Natural Language Toolkit (nltk). The semantics modules in nltk are inspired in large part by the approach developed in Blackburn and Bos (2005)(henceforth referred to as B&B). Since Blackburn and Bos have also provided a software suite to accompany their excellent textbook, one might ask what the justification is for the nltk offering, which is similarly slanted towards teaching computational semantics.

Research paper thumbnail of Recognising Textual Entailment Focusing on Non-Entailing Text and Hypothesis

Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus o... more Abstract This paper describes a predominantly shallow approach to the rte-4 Challenge. We focus our attention on the non-entailing Text and Hypothesis pairs in the dataset. The system uses a Maximum Entropy framework to classify each pair of Text and Hypothesis as either yes or no, using a range of different feature sets based on an analysis of the existing non-entailing pairs in rte training data.

Research paper thumbnail of Experiments with data-intensive NLP on a computational grid

Abstract Large databases of annotated text and speech are widely used for developing and testing ... more Abstract Large databases of annotated text and speech are widely used for developing and testing language technologies. However, the size of these corpora and associated language models are outpacing the growth of processing power and network bandwidth available to most researchers.

Research paper thumbnail of Multidisciplinary instruction with the natural language toolkit

Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language process... more Abstract The Natural Language Toolkit (NLTK) is widely used for teaching natural language processing to students majoring in linguistics or computer science. This paper describes the design of NLTK, and reports on how it has been used effectively in classes that involve different mixes of linguistics and computer science students. We focus on three key issues: getting started with a course, delivering interactive demonstrations in the classroom, and organizing assignments and projects.