Vladimir Eidelman | Other - Academia.edu (original) (raw)
Papers by Vladimir Eidelman
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Student Research Workshop - HLT '08, 2008
Many applications in NLP, such as question- answering and summarization, either require or would ... more Many applications in NLP, such as question- answering and summarization, either require or would greatly benefit from the knowledge of when an event occurred. Creating an ef- fective algorithm for identifying the activ- ity time of an event in news is difficult in part because of the sparsity of explicit tem- poral expressions. This paper describes a domain-independent machine-learning based
... English News monolingual training data as well as the English side of the parallel corpus usi... more ... English News monolingual training data as well as the English side of the parallel corpus using the ... While BLEU is usually calculated at the corpus level, we need to approximate the metric at ... selec-tion (M±C). On both sets, (M±C) performs better, but the results are comparable. ...
MonoTrans2 is a translation system that com-bines machine translation (MT) with human computation... more MonoTrans2 is a translation system that com-bines machine translation (MT) with human computation using two crowds of monolin-gual source (Haitian Creole) and target (En-glish) speakers. We report on its use in the WMT 2011 Haitian Creole to English trans-lation task, showing that MonoTrans2 trans-lated 38% of the sentences well compared to Google Translate's 25%.
Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their... more Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their societies. In this paper we propose a fuzzy algebraic framework where the valuating sets are other than the unit interval (lattices, partially ordered sets or relational structures) . This provides for a flexible organization of the information gathered by the agent (via interactions with the environment and/or other agents) and enables its selected use when different drives are active. Agents (Petitagé, ANNA, POPSICLE and Izbushka) , which are instantiations of our model, are also given in order to illustrate the use of this framework, as well as its possible extensions.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Demo Session - HLT '08, 2008
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers on - NAACL '09, 2009
In this paper, we describe and evaluate a bigram part-of-speech (POS) tagger that uses latent ann... more In this paper, we describe and evaluate a bigram part-of-speech (POS) tagger that uses latent annotations and then investigate using additional genre-matched unlabeled data for self-training the tagger. The use of latent annotations substantially improves the performance of a baseline HMM bigram tagger, outperforming a trigram HMM tagger with sophisticated smoothing. The performance of the latent tagger is further enhanced by self-training with a large set of unlabeled data, even in situations where standard bigram or trigram taggers do not benefit from selftraining when trained on greater amounts of labeled training data. Our best model obtains a state-of-the-art Chinese tagging accuracy of 94.78% when evaluated on a representative test set of the Penn Chinese Treebank 6.0.
Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09, 2009
This paper describes the system we developed to improve German-English translation of News text f... more This paper describes the system we developed to improve German-English translation of News text for the shared task of the Fifth Workshop on Statistical Machine Translation. Working within cdec, an open source modular framework for machine translation, we explore the benefits of several modifications to our hierarchical phrase-based model, including segmentation lattices, minimum Bayes Risk decoding, grammar extraction methods, and varying language models. Furthermore, we analyze decoder speed and memory performance across our set of models and show there is an important trade-off that needs to be made.
International Journal of Agent Technologies and Systems, 2000
Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their... more Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their societies. In this article we propose a fuzzy algebraic framework where the valuating sets are other than the unit interval (lattices, partially ordered sets or relational structures) . This ...
This paper examines tagging models for spontaneous English speech transcripts. We analyze the per... more This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly improve performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.
This paper presents the system we developed for the 2011 WMT Haitian CreoleEnglish SMS featured ... more This paper presents the system we developed for the 2011 WMT Haitian CreoleEnglish SMS featured translation task. Applying stan-dard statistical machine translation methods to noisy real-world SMS data in a low-density language setting such as Haitian Creole poses a ...
... on par with or better than GTM, and provides significant gains even in the NIST data set-ting... more ... on par with or better than GTM, and provides significant gains even in the NIST data set-ting, showing that ... We can also incorporate large quanti-ties of additional data (whether parallel or not) in the source language to infer better ... Yik-Cheung Tam, Ian Lane, and Tanja Schultz. ...
old-site.clsp.jhu.edu
The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. Unf... more The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. Unfortunately these successes have not been uniform; closely related language pairs can be translated with a high degree of precision, while for distant pairs the result is far from acceptable. Models which have been most successful for translating between structurally divergent language pairs have been based on synchronous grammars. A critical component of these translation models is their grammar which encodes translational ...
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR}
@Book{WMT:2010, editor = {Chris Callison-Burch and Philipp Koehn and Christof Monz and Kay Peters... more @Book{WMT:2010, editor = {Chris Callison-Burch and Philipp Koehn and Christof Monz and Kay Peterson and Omar Zaidan}, title = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR}, month = {July}, year = {2010}, address = {Uppsala, Sweden}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/ anthology/W10-17} } @InProceedings{gao-bach-vogel:2010:WMT, author = {Gao, Qin and Bach, Nguyen and Vogel, Stephan}, title = {A Semi-Supervised Word Alignment Algorithm ...
Proceedings of the Sixth Workshop on Statistical Machine Translation, Jul 30, 2011
MonoTrans2 is a translation system that combines machine translation (MT) with human computation ... more MonoTrans2 is a translation system that combines machine translation (MT) with human computation using two crowds of monolingual source (Haitian Creole) and target (English) speakers. We report on its use in the WMT 2011 Haitian Creole to English translation task, showing that MonoTrans2 translated 38% of the sentences well compared to Google Translate's 25%.
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Student Research Workshop - HLT '08, 2008
Many applications in NLP, such as question- answering and summarization, either require or would ... more Many applications in NLP, such as question- answering and summarization, either require or would greatly benefit from the knowledge of when an event occurred. Creating an ef- fective algorithm for identifying the activ- ity time of an event in news is difficult in part because of the sparsity of explicit tem- poral expressions. This paper describes a domain-independent machine-learning based
... English News monolingual training data as well as the English side of the parallel corpus usi... more ... English News monolingual training data as well as the English side of the parallel corpus using the ... While BLEU is usually calculated at the corpus level, we need to approximate the metric at ... selec-tion (M±C). On both sets, (M±C) performs better, but the results are comparable. ...
MonoTrans2 is a translation system that com-bines machine translation (MT) with human computation... more MonoTrans2 is a translation system that com-bines machine translation (MT) with human computation using two crowds of monolin-gual source (Haitian Creole) and target (En-glish) speakers. We report on its use in the WMT 2011 Haitian Creole to English trans-lation task, showing that MonoTrans2 trans-lated 38% of the sentences well compared to Google Translate's 25%.
Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their... more Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their societies. In this paper we propose a fuzzy algebraic framework where the valuating sets are other than the unit interval (lattices, partially ordered sets or relational structures) . This provides for a flexible organization of the information gathered by the agent (via interactions with the environment and/or other agents) and enables its selected use when different drives are active. Agents (Petitagé, ANNA, POPSICLE and Izbushka) , which are instantiations of our model, are also given in order to illustrate the use of this framework, as well as its possible extensions.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Demo Session - HLT '08, 2008
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers on - NAACL '09, 2009
In this paper, we describe and evaluate a bigram part-of-speech (POS) tagger that uses latent ann... more In this paper, we describe and evaluate a bigram part-of-speech (POS) tagger that uses latent annotations and then investigate using additional genre-matched unlabeled data for self-training the tagger. The use of latent annotations substantially improves the performance of a baseline HMM bigram tagger, outperforming a trigram HMM tagger with sophisticated smoothing. The performance of the latent tagger is further enhanced by self-training with a large set of unlabeled data, even in situations where standard bigram or trigram taggers do not benefit from selftraining when trained on greater amounts of labeled training data. Our best model obtains a state-of-the-art Chinese tagging accuracy of 94.78% when evaluated on a representative test set of the Penn Chinese Treebank 6.0.
Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT '09, 2009
This paper describes the system we developed to improve German-English translation of News text f... more This paper describes the system we developed to improve German-English translation of News text for the shared task of the Fifth Workshop on Statistical Machine Translation. Working within cdec, an open source modular framework for machine translation, we explore the benefits of several modifications to our hierarchical phrase-based model, including segmentation lattices, minimum Bayes Risk decoding, grammar extraction methods, and varying language models. Furthermore, we analyze decoder speed and memory performance across our set of models and show there is an important trade-off that needs to be made.
International Journal of Agent Technologies and Systems, 2000
Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their... more Fuzzy algebraic structures are a useful and flexible tool for modeling cognitive agents and their societies. In this article we propose a fuzzy algebraic framework where the valuating sets are other than the unit interval (lattices, partially ordered sets or relational structures) . This ...
This paper examines tagging models for spontaneous English speech transcripts. We analyze the per... more This paper examines tagging models for spontaneous English speech transcripts. We analyze the performance of state-of-the-art tagging models, either generative or discriminative, left-to-right or bidirectional, with or without latent annotations, together with the use of ToBI break indexes and several methods for segmenting the speech transcripts (i.e., conversation side, speaker turn, or humanannotated sentence). Based on these studies, we observe that: (1) bidirectional models tend to achieve better accuracy levels than left-toright models, (2) generative models seem to perform somewhat better than discriminative models on this task, and (3) prosody improves tagging performance of models on conversation sides, but has much less impact on smaller segments. We conclude that, although the use of break indexes can indeed significantly improve performance over baseline models without them on conversation sides, tagging accuracy improves more by using smaller segments, for which the impact of the break indexes is marginal.
This paper presents the system we developed for the 2011 WMT Haitian CreoleEnglish SMS featured ... more This paper presents the system we developed for the 2011 WMT Haitian CreoleEnglish SMS featured translation task. Applying stan-dard statistical machine translation methods to noisy real-world SMS data in a low-density language setting such as Haitian Creole poses a ...
... on par with or better than GTM, and provides significant gains even in the NIST data set-ting... more ... on par with or better than GTM, and provides significant gains even in the NIST data set-ting, showing that ... We can also incorporate large quanti-ties of additional data (whether parallel or not) in the source language to infer better ... Yik-Cheung Tam, Ian Lane, and Tanja Schultz. ...
old-site.clsp.jhu.edu
The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. Unf... more The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. Unfortunately these successes have not been uniform; closely related language pairs can be translated with a high degree of precision, while for distant pairs the result is far from acceptable. Models which have been most successful for translating between structurally divergent language pairs have been based on synchronous grammars. A critical component of these translation models is their grammar which encodes translational ...
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR}
@Book{WMT:2010, editor = {Chris Callison-Burch and Philipp Koehn and Christof Monz and Kay Peters... more @Book{WMT:2010, editor = {Chris Callison-Burch and Philipp Koehn and Christof Monz and Kay Peterson and Omar Zaidan}, title = {Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR}, month = {July}, year = {2010}, address = {Uppsala, Sweden}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/ anthology/W10-17} } @InProceedings{gao-bach-vogel:2010:WMT, author = {Gao, Qin and Bach, Nguyen and Vogel, Stephan}, title = {A Semi-Supervised Word Alignment Algorithm ...
Proceedings of the Sixth Workshop on Statistical Machine Translation, Jul 30, 2011
MonoTrans2 is a translation system that combines machine translation (MT) with human computation ... more MonoTrans2 is a translation system that combines machine translation (MT) with human computation using two crowds of monolingual source (Haitian Creole) and target (English) speakers. We report on its use in the WMT 2011 Haitian Creole to English translation task, showing that MonoTrans2 translated 38% of the sentences well compared to Google Translate's 25%.