Oren Glickman | Carnegie Mellon University (original) (raw)
Papers by Oren Glickman
… Issues in Machine …, Jan 1, 1995
We report on techniques for using discourse context to reduce ambiguity and improve translation a... more We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996
In this paper we investigate the possibility of translating continuous spoken conversations in a ... more In this paper we investigate the possibility of translating continuous spoken conversations in a cross-talk environment. This is a task known to be difficult for human translators due to several factors. It is characterized by rapid and even overlapping turn-taking, a high degree of co-articulation, and fragmentary language. We describe experiments using both push-to-talk as well as cross-talk recording conditions. Our results indicate that conversational speech recognition and translation is possible, even in a free crosstalk environment. To date, our system has achieved performances of over 80% acceptable translations on transcribed input, and over 70% acceptable translations on speech input recognized with a 70-80% word accuracy. The system's performance on spontaneous conversations recorded in a cross-talk environment is shown to be as good and even slightly superior to the simpler and easier push-to-talk scenario.
International Joint Conference on Artificial Intelligence, 2005
The textual entailment problem is to determine if a given text entails a given hypothesis. This p... more The textual entailment problem is to determine if a given text entails a given hypothesis. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical con-cepts present in the hypothesis are entailed from the text. This problem is recast as one of text cate-gorization in which the
All components of a typical IE system have been the object of some machine learning research, mot... more All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can help create a complete IE system that can be easily adapted to new domains. We also lay
Computing Research Repository, 2003
This paper studies the potential of identifying lexical paraphrases within a single corpus, focus... more This paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of comparable corpora, each of them containing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel method that
This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of t... more This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of textual entailment. We further describe a particular preliminary model for lexical-level entailment, based on document cooccurrence probabilities, which follows the general setting. The model was evaluated on two application independent datasets, suggesting the rele- vance of such probabilistic approaches for entailment modeling.
Recent Advances in Natural Language Processing - RANLP, 2003
This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- c... more This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- cusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of "comparable" corpora, each of them contain- ing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel
This paper proposes a general probabilis-tic setting that formalizes the notion of textual entail... more This paper proposes a general probabilis-tic setting that formalizes the notion of textual entailment. In addition we de-scribe a concrete model for lexical en-tailment based on web co-occurrence statistics in a bag of words representation.
This paper investigates an isolated setting of the lexical substitution task of replac- ing words... more This paper investigates an isolated setting of the lexical substitution task of replac- ing words with their synonyms. In par- ticular, we examine this problem in the setting of subtitle generation and evaluate state of the art scoring methods that pre- dict the validity of a given substitution. The paper evaluates two context indepen- dent models and two contextual models.
Lecture Notes in Computer Science, 2006
This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Chal... more This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Challenge. The paper proposes first a general probabilistic setting that formalizes the notion of textual entailment. We then describe a concrete alignment-based model for lexical entailment, which utilizes web co-occurrence statistics in a bag of words representation. Finally, we report the results of the model on the Recognising Textual Entailment challenge dataset along with some analysis.
A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in vari... more A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in various ways. Robust language processing applica-tions-like Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), text summarization and machine translation-must recognize the different forms in which their inputs and requested outputs might be expressed. Today, inferences about language variability are often per-formed by practical systems at a
Lecture Notes in Computer Science, 2006
This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We des... more This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We describe the RTE-2 dataset and overview the submissions for the challenge. One of the main goals for this year's dataset was to provide more "realistic" text-hypothesis examples, based mostly on outputs of actual systems. The 23 submissions for the challenge present diverse approaches and research directions, and the best results achieved this year are considerably higher than last year's state of the art.
Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006
CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the... more CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the tenth time that ACL's special interest group on natural language learning, SIGNLL, holds its yearly conference.
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment - EMSEE '05, 2005
… of the 2006 Conference on Empirical …, 2006
Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it ... more Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical matching. Based on this ...
… Language Processing, 1998
In this paper we address the problem of aligning very long (of- ten more than one hour) audio fil... more In this paper we address the problem of aligning very long (of- ten more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur- sive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem
The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phone... more The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phonetic units areidentified by the phonetic recognition engine and mappedto mouth gestures, known as visemes, the visual counterpartof phonemes. The acoustic waveform and visemesare then sent to a synthetic image player, called FaceMe!where they are rendered synchronously. This paper providesbackground for the core technologies
PROCEEDINGS OF THE NATIONAL …, 2005
The textual entailment task -determining if a given text entails a given hypothesis -provides an ... more The textual entailment task -determining if a given text entails a given hypothesis -provides an abstraction of applied semantic inference. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical concepts present in the hypothesis are entailed from the text. This problem is recast as one of text categorization in which the classes are the vocabulary words. We make novel use of Naïve Bayes to model the problem in an entirely unsupervised fashion. Empirical tests suggest that the method is effective and compares favorably with state-of-the-art heuristic scoring approaches.
Theoretical and Methodological Issues in Machine Translation, Jul 5, 1995
Abstract: We report on techniques for using discourse context to reduce ambiguity and improve tra... more Abstract: We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.
… Issues in Machine …, Jan 1, 1995
We report on techniques for using discourse context to reduce ambiguity and improve translation a... more We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996
In this paper we investigate the possibility of translating continuous spoken conversations in a ... more In this paper we investigate the possibility of translating continuous spoken conversations in a cross-talk environment. This is a task known to be difficult for human translators due to several factors. It is characterized by rapid and even overlapping turn-taking, a high degree of co-articulation, and fragmentary language. We describe experiments using both push-to-talk as well as cross-talk recording conditions. Our results indicate that conversational speech recognition and translation is possible, even in a free crosstalk environment. To date, our system has achieved performances of over 80% acceptable translations on transcribed input, and over 70% acceptable translations on speech input recognized with a 70-80% word accuracy. The system's performance on spontaneous conversations recorded in a cross-talk environment is shown to be as good and even slightly superior to the simpler and easier push-to-talk scenario.
International Joint Conference on Artificial Intelligence, 2005
The textual entailment problem is to determine if a given text entails a given hypothesis. This p... more The textual entailment problem is to determine if a given text entails a given hypothesis. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical con-cepts present in the hypothesis are entailed from the text. This problem is recast as one of text cate-gorization in which the
All components of a typical IE system have been the object of some machine learning research, mot... more All components of a typical IE system have been the object of some machine learning research, motivated by the need to improve time taken to transfer to new domains. In this paper we survey such methods and assess to what extent they can help create a complete IE system that can be easily adapted to new domains. We also lay
Computing Research Repository, 2003
This paper studies the potential of identifying lexical paraphrases within a single corpus, focus... more This paper studies the potential of identifying lexical paraphrases within a single corpus, focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of comparable corpora, each of them containing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel method that
This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of t... more This paper proposes a general probabilis- tic setting that formalizes a probabilistic notion of textual entailment. We further describe a particular preliminary model for lexical-level entailment, based on document cooccurrence probabilities, which follows the general setting. The model was evaluated on two application independent datasets, suggesting the rele- vance of such probabilistic approaches for entailment modeling.
Recent Advances in Natural Language Processing - RANLP, 2003
This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- c... more This paper studies the potential of identifying lexical paraphrases within a single corpus, fo- cusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or set) of "comparable" corpora, each of them contain- ing roughly the same information, and rely on the substantial level of correspondence of such corpora. We present a novel
This paper proposes a general probabilis-tic setting that formalizes the notion of textual entail... more This paper proposes a general probabilis-tic setting that formalizes the notion of textual entailment. In addition we de-scribe a concrete model for lexical en-tailment based on web co-occurrence statistics in a bag of words representation.
This paper investigates an isolated setting of the lexical substitution task of replac- ing words... more This paper investigates an isolated setting of the lexical substitution task of replac- ing words with their synonyms. In par- ticular, we examine this problem in the setting of subtitle generation and evaluate state of the art scoring methods that pre- dict the validity of a given substitution. The paper evaluates two context indepen- dent models and two contextual models.
Lecture Notes in Computer Science, 2006
This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Chal... more This paper describes the Bar-Ilan system participating in the Recognising Textual Entailment Challenge. The paper proposes first a general probabilistic setting that formalizes the notion of textual entailment. We then describe a concrete alignment-based model for lexical entailment, which utilizes web co-occurrence statistics in a bag of words representation. Finally, we report the results of the model on the Recognising Textual Entailment challenge dataset along with some analysis.
A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in vari... more A most prominent phenomenon of natural lan-guages is variability-stating the same meaning in various ways. Robust language processing applica-tions-like Information Retrieval (IR), Question Answering (QA), Information Extraction (IE), text summarization and machine translation-must recognize the different forms in which their inputs and requested outputs might be expressed. Today, inferences about language variability are often per-formed by practical systems at a
Lecture Notes in Computer Science, 2006
This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We des... more This paper describes the Second PASCAL Recognising Textual Entailment Challenge (RTE-2). 1 We describe the RTE-2 dataset and overview the submissions for the challenge. One of the main goals for this year's dataset was to provide more "realistic" text-hypothesis examples, based mostly on outputs of actual systems. The 23 submissions for the challenge present diverse approaches and research directions, and the best results achieved this year are considerably higher than last year's state of the art.
Proceedings of the Tenth Conference on Computational Natural Language Learning - CoNLL-X '06, 2006
CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the... more CoNLL has turned ten! With a mix of pride and amazement over how time flies, we now celebrate the tenth time that ACL's special interest group on natural language learning, SIGNLL, holds its yearly conference.
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment - EMSEE '05, 2005
… of the 2006 Conference on Empirical …, 2006
Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it ... more Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical matching. Based on this ...
… Language Processing, 1998
In this paper we address the problem of aligning very long (of- ten more than one hour) audio fil... more In this paper we address the problem of aligning very long (of- ten more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recur- sive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem
The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phone... more The goal of this work is to use phonetic recognition todrive a synthetic image with speech. Phonetic units areidentified by the phonetic recognition engine and mappedto mouth gestures, known as visemes, the visual counterpartof phonemes. The acoustic waveform and visemesare then sent to a synthetic image player, called FaceMe!where they are rendered synchronously. This paper providesbackground for the core technologies
PROCEEDINGS OF THE NATIONAL …, 2005
The textual entailment task -determining if a given text entails a given hypothesis -provides an ... more The textual entailment task -determining if a given text entails a given hypothesis -provides an abstraction of applied semantic inference. This paper describes first a general generative probabilistic setting for textual entailment. We then focus on the sub-task of recognizing whether the lexical concepts present in the hypothesis are entailed from the text. This problem is recast as one of text categorization in which the classes are the vocabulary words. We make novel use of Naïve Bayes to model the problem in an entirely unsupervised fashion. Empirical tests suggest that the method is effective and compares favorably with state-of-the-art heuristic scoring approaches.
Theoretical and Methodological Issues in Machine Translation, Jul 5, 1995
Abstract: We report on techniques for using discourse context to reduce ambiguity and improve tra... more Abstract: We report on techniques for using discourse context to reduce ambiguity and improve translation accuracy in a multi-lingual (Spanish, German, and English) spoken language translation system. The techniques involve statistical models as well as knowledge-based models including discourse plan inference. This work is carried out in the context of the Janus project at Carnegie Mellon University and the University of Karlsruhe.