Oren Glickman | Carnegie Mellon University (original) (raw)

Papers by Oren Glickman

Research paper thumbnail of Using context in machine translation of spoken language

… Issues in Machine …, Jan 1, 1995

We report on techniques for using discourse context to reduce ambiguity and improve translation a... more We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.

Research paper thumbnail of Translation of conversational speech with JANUS-II

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996

In this paper we investigate the possibility of translating continuous spoken conversations in a ... more In this paper we investigate the possibility of translating continuous spoken conversations in a cross-talk environment. This is a task known to be difficult for human translators due to several factors. It is characterized by rapid and even overlapping turn-taking, a high degree of co-articulation, and fragmentary language. We describe experiments using both push-to-talk as well as cross-talk recording conditions. Our results indicate that conversational speech recognition and translation is possible, even in a free crosstalk environment. To date, our system has achieved performances of over 80% acceptable translations on transcribed input, and over 70% acceptable translations on speech input recognized with a 70-80% word accuracy. The system's performance on spontaneous conversations recorded in a cross-talk environment is shown to be as good and even slightly superior to the simpler and easier push-to-talk scenario.

Research paper thumbnail of A Probabilistic Lexical Approach to Textual Entailment

International Joint Conference on Artificial Intelligence, 2005

The textual entailment problem is to determine if a given text entails a given hypothesis. This p... more The textual entailment problem is to determine if a given text entails a given hypothesis. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical con-cepts present in the hypothesis are entailed from the text. This problem is recast as one of text cate-gorization in which the

Research paper thumbnail of Web Based Textual Entailment

Research paper thumbnail of Examining Machine Learning for Adaptable End-to-End Information Extraction Systems

All components of a typical IE system have been the object of some machine learning research, mot... more All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can help create a complete IE system that can be easily adapted to new domains. We also lay

Research paper thumbnail of Acquiring lexical paraphrases from a single corpus

Computing Research Repository, 2003

This paper studies the potential of identifying lexical paraphrases within a single corpus, focus... more This paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of comparable corpora, each of them containing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel method that

Research paper thumbnail of A Probabilistic Setting and Lexical Cooccurrence Model for Textual Entailment

This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of t... more This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of textual entailment. We further describe a particular preliminary model for lexical-level entailment, based on document cooccurrence probabilities, which follows the general setting. The model was evaluated on two application independent datasets, suggesting the rele- vance of such probabilistic approaches for entailment modeling.

Research paper thumbnail of IDENTIFYING LEXICAL PARAPHRASES FROM A SINGLE CORPUS: A CASE STUDY FOR VERBS

Recent Advances in Natural Language Processing - RANLP, 2003

This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- c... more This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- cusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of "comparable" corpora, each of them contain- ing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel

Research paper thumbnail of Web Based Probabilistic Textual Entailment

This paper proposes a general probabilis-tic setting that formalizes the notion of textual entail... more This paper proposes a general probabilis-tic setting that formalizes the notion of textual entailment. In addition we de-scribe a concrete model for lexical en-tailment based on web co-occurrence statistics in a bag of words representation.

Research paper thumbnail of Investigating Lexical Substitution Scoring for Subtitle Generation

This paper investigates an isolated setting of the lexical substitution task of replac- ing words... more This paper investigates an isolated setting of the lexical substitution task of replac- ing words with their synonyms. In par- ticular, we examine this problem in the setting of subtitle generation and evaluate state of the art scoring methods that pre- dict the validity of a given substitution. The paper evaluates two context indepen- dent models and two contextual models.

Research paper thumbnail of A Lexical Alignment Model for Probabilistic Textual Entailment

Lecture Notes in Computer Science, 2006

This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Chal... more This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Challenge. The paper proposes first a general probabilistic setting that formalizes the notion of textual entailment. We then describe a concrete alignment-based model for lexical entailment, which utilizes web co-occurrence statistics in a bag of words representation. Finally, we report the results of the model on the Recognising Textual Entailment challenge dataset along with some analysis.

Research paper thumbnail of PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY

A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in vari... more A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in various ways. Robust language processing applica-tions-like Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), text summarization and machine translation-must recognize the different forms in which their inputs and requested outputs might be expressed. Today, inferences about language variability are often per-formed by practical systems at a

Research paper thumbnail of The PASCAL Recognising Textual Entailment Challenge

Lecture Notes in Computer Science, 2006

This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We des... more This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We describe the RTE-2 dataset and overview the submissions for the challenge. One of the main goals for this year's dataset was to provide more "realistic" text-hypothesis examples, based mostly on outputs of actual systems. The 23 submissions for the challenge present diverse approaches and research directions, and the best results achieved this year are considerably higher than last year's state of the art.

Research paper thumbnail of Investigating lexical substitution scoring for subtitle generation

Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006

CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the... more CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the tenth time that ACL's special interest group on natural language learning, SIGNLL, holds its yearly conference.

Research paper thumbnail of Definition and analysis of intermediate entailment levels

Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment - EMSEE '05, 2005

Research paper thumbnail of Lexical reference: a semantic matching subtask

… of the 2006 Conference on Empirical …, 2006

Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it ... more Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical matching. Based on this ...

Research paper thumbnail of A recursive algorithm for the forced alignment of very long audio segments

… Language Processing, 1998

In this paper we address the problem of aligning very long (of- ten more than one hour) audio fil... more In this paper we address the problem of aligning very long (of- ten more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur- sive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem

Research paper thumbnail of Driving synthetic mouth gestures: Phonetic recognition for faceme!

The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phone... more The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phonetic units areidentified by the phonetic recognition engine and mappedto mouth gestures, known as visemes, the visual counterpartof phonemes. The acoustic waveform and visemesare then sent to a synthetic image player, called FaceMe!where they are rendered synchronously. This paper providesbackground for the core technologies

Research paper thumbnail of A probabilistic classification approach for lexical textual entailment

PROCEEDINGS OF THE NATIONAL …, 2005

The textual entailment task -determining if a given text entails a given hypothesis -provides an ... more The textual entailment task -determining if a given text entails a given hypothesis -provides an abstraction of applied semantic inference. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical concepts present in the hypothesis are entailed from the text. This problem is recast as one of text categorization in which the classes are the vocabulary words. We make novel use of Naïve Bayes to model the problem in an entirely unsupervised fashion. Empirical tests suggest that the method is effective and compares favorably with state-of-the-art heuristic scoring approaches.

Research paper thumbnail of Using context in machine translation of spoken language

Theoretical and Methodological Issues in Machine Translation, Jul 5, 1995

Abstract: We report on techniques for using discourse context to reduce ambiguity and improve tra... more Abstract: We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.

Research paper thumbnail of Using context in machine translation of spoken language

… Issues in Machine …, Jan 1, 1995

We report on techniques for using discourse context to reduce ambiguity and improve translation a... more We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.

Research paper thumbnail of Translation of conversational speech with JANUS-II

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996

In this paper we investigate the possibility of translating continuous spoken conversations in a ... more In this paper we investigate the possibility of translating continuous spoken conversations in a cross-talk environment. This is a task known to be difficult for human translators due to several factors. It is characterized by rapid and even overlapping turn-taking, a high degree of co-articulation, and fragmentary language. We describe experiments using both push-to-talk as well as cross-talk recording conditions. Our results indicate that conversational speech recognition and translation is possible, even in a free crosstalk environment. To date, our system has achieved performances of over 80% acceptable translations on transcribed input, and over 70% acceptable translations on speech input recognized with a 70-80% word accuracy. The system's performance on spontaneous conversations recorded in a cross-talk environment is shown to be as good and even slightly superior to the simpler and easier push-to-talk scenario.

Research paper thumbnail of A Probabilistic Lexical Approach to Textual Entailment

International Joint Conference on Artificial Intelligence, 2005

The textual entailment problem is to determine if a given text entails a given hypothesis. This p... more The textual entailment problem is to determine if a given text entails a given hypothesis. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical con-cepts present in the hypothesis are entailed from the text. This problem is recast as one of text cate-gorization in which the

Research paper thumbnail of Web Based Textual Entailment

Research paper thumbnail of Examining Machine Learning for Adaptable End-to-End Information Extraction Systems

All components of a typical IE system have been the object of some machine learning research, mot... more All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can help create a complete IE system that can be easily adapted to new domains. We also lay

Research paper thumbnail of Acquiring lexical paraphrases from a single corpus

Computing Research Repository, 2003

This paper studies the potential of identifying lexical paraphrases within a single corpus, focus... more This paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of comparable corpora, each of them containing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel method that

Research paper thumbnail of A Probabilistic Setting and Lexical Cooccurrence Model for Textual Entailment

This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of t... more This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of textual entailment. We further describe a particular preliminary model for lexical-level entailment, based on document cooccurrence probabilities, which follows the general setting. The model was evaluated on two application independent datasets, suggesting the rele- vance of such probabilistic approaches for entailment modeling.

Research paper thumbnail of IDENTIFYING LEXICAL PARAPHRASES FROM A SINGLE CORPUS: A CASE STUDY FOR VERBS

Recent Advances in Natural Language Processing - RANLP, 2003

This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- c... more This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- cusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of "comparable" corpora, each of them contain- ing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel

Research paper thumbnail of Web Based Probabilistic Textual Entailment

This paper proposes a general probabilis-tic setting that formalizes the notion of textual entail... more This paper proposes a general probabilis-tic setting that formalizes the notion of textual entailment. In addition we de-scribe a concrete model for lexical en-tailment based on web co-occurrence statistics in a bag of words representation.

Research paper thumbnail of Investigating Lexical Substitution Scoring for Subtitle Generation

This paper investigates an isolated setting of the lexical substitution task of replac- ing words... more This paper investigates an isolated setting of the lexical substitution task of replac- ing words with their synonyms. In par- ticular, we examine this problem in the setting of subtitle generation and evaluate state of the art scoring methods that pre- dict the validity of a given substitution. The paper evaluates two context indepen- dent models and two contextual models.

Research paper thumbnail of A Lexical Alignment Model for Probabilistic Textual Entailment

Lecture Notes in Computer Science, 2006

This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Chal... more This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Challenge. The paper proposes first a general probabilistic setting that formalizes the notion of textual entailment. We then describe a concrete alignment-based model for lexical entailment, which utilizes web co-occurrence statistics in a bag of words representation. Finally, we report the results of the model on the Recognising Textual Entailment challenge dataset along with some analysis.

Research paper thumbnail of PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY

A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in vari... more A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in various ways. Robust language processing applica-tions-like Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), text summarization and machine translation-must recognize the different forms in which their inputs and requested outputs might be expressed. Today, inferences about language variability are often per-formed by practical systems at a

Research paper thumbnail of The PASCAL Recognising Textual Entailment Challenge

Lecture Notes in Computer Science, 2006

This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We des... more This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We describe the RTE-2 dataset and overview the submissions for the challenge. One of the main goals for this year's dataset was to provide more "realistic" text-hypothesis examples, based mostly on outputs of actual systems. The 23 submissions for the challenge present diverse approaches and research directions, and the best results achieved this year are considerably higher than last year's state of the art.

Research paper thumbnail of Investigating lexical substitution scoring for subtitle generation

Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006

CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the... more CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the tenth time that ACL's special interest group on natural language learning, SIGNLL, holds its yearly conference.

Research paper thumbnail of Definition and analysis of intermediate entailment levels

Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment - EMSEE '05, 2005

Research paper thumbnail of Lexical reference: a semantic matching subtask

… of the 2006 Conference on Empirical …, 2006

Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it ... more Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical matching. Based on this ...

Research paper thumbnail of A recursive algorithm for the forced alignment of very long audio segments

… Language Processing, 1998

In this paper we address the problem of aligning very long (of- ten more than one hour) audio fil... more In this paper we address the problem of aligning very long (of- ten more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur- sive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem

Research paper thumbnail of Driving synthetic mouth gestures: Phonetic recognition for faceme!

The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phone... more The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phonetic units areidentified by the phonetic recognition engine and mappedto mouth gestures, known as visemes, the visual counterpartof phonemes. The acoustic waveform and visemesare then sent to a synthetic image player, called FaceMe!where they are rendered synchronously. This paper providesbackground for the core technologies

Research paper thumbnail of A probabilistic classification approach for lexical textual entailment

PROCEEDINGS OF THE NATIONAL …, 2005

The textual entailment task -determining if a given text entails a given hypothesis -provides an ... more The textual entailment task -determining if a given text entails a given hypothesis -provides an abstraction of applied semantic inference. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical concepts present in the hypothesis are entailed from the text. This problem is recast as one of text categorization in which the classes are the vocabulary words. We make novel use of Naïve Bayes to model the problem in an entirely unsupervised fashion. Empirical tests suggest that the method is effective and compares favorably with state-of-the-art heuristic scoring approaches.

Research paper thumbnail of Using context in machine translation of spoken language

Theoretical and Methodological Issues in Machine Translation, Jul 5, 1995

Abstract: We report on techniques for using discourse context to reduce ambiguity and improve tra... more Abstract: We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.