Ondřej Dušek | Charles University, Prague (original) (raw)

Ondřej Dušek

Uploads

Papers by Ondřej Dušek

Research paper thumbnail of Zum Vergleich der tschechischen und deutschen Valenzwörterbücher

Research paper thumbnail of Verbal Valency Frame Detection and Selection in Czech and English

Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 2014

Research paper thumbnail of Alex: Bootstrapping a Spoken Dialogue System for a New Domain by Real Users

Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014

Research paper thumbnail of Towards a Truly Statistical Natural Language Generator for Spoken Dialogues

Research paper thumbnail of Natural Language Generation

Research paper thumbnail of MTMonkey

Research paper thumbnail of Vystadial 2013–English data

Research paper thumbnail of Vystadial 2013–Czech data

Research paper thumbnail of Vystadial 2013–scripts

Research paper thumbnail of Robust Multilingual Statistical Morphological Generation Models

Research paper thumbnail of Czech-English Parallel Corpus 1.0 (CzEng 1.0)

Research paper thumbnail of Alex: A Statistical Dialogue Systems Framework

Lecture Notes in Computer Science, 2014

Research paper thumbnail of A Factored Discriminative Spoken Language Understanding for Spoken Dialogue Systems

Lecture Notes in Computer Science, 2014

Research paper thumbnail of MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service

The Prague Bulletin of Mathematical Linguistics, 2013

Research paper thumbnail of DEPFIX: a system for automatic correction of Czech MT outputs

ABSTRACT We present an improved version of DEPFIX (Mareček et al., 2011), a system... more ABSTRACT We present an improved version of DEPFIX (Mareček et al., 2011), a system for automatic rule-based post-processing of English-to-Czech MT outputs designed to increase their fluency. We enhanced the rule set used by the original DEPFIX system and measured the performance of the individual rules. We also modified the dependency parser of McDonald et al. (2005) in two ways to adjust it for the parsing of MT outputs. We show that our system is able to improve the quality of the state-of-the-art MT systems.

Research paper thumbnail of The Joy of Parallelism with CzEng 1.0

mt-archive.info, 2012

CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-co... more CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automatically aligned at the level of sentences as well as words. We provide not only the plain text representation, but also automatic morphological tags, surface syntactic as well as deep syntactic dependency parse trees and automatic co-reference links in both English and Czech. This paper describes key properties of the released resource including the distribution of text domains, the corpus data formats, and a toolkit to handle the provided rich annotation. We also summarize the procedure of the rich annotation (incl. co-reference resolution) and of the automatic filtering. Finally, we provide some suggestions on exploiting such an automatically annotated sentence-parallel corpus.

Research paper thumbnail of Formemes in English-Czech Deep Syntactic MT

acl.eldoc.ub.rug.nl, 2012

One of the most notable recent improvements of the TectoMT English-to-Czech translation is a syst... more One of the most notable recent improvements of the TectoMT English-to-Czech translation is a systematic and theoretically supported revision of formemes-the annotation of morpho-syntactic features of content words in deep dependency syntactic structures based on the Prague tectogrammatics theory. Our modifications aim at reducing data sparsity, increasing consistency across languages and widening the usage area of this markup. Formemes can be used not only in MT, but in various other NLP tasks.

Research paper thumbnail of Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors

SSST-6, 2012

ABSTRACT In this paper, we present two dependency parser training methods appropriate for parsing... more ABSTRACT In this paper, we present two dependency parser training methods appropriate for parsing outputs of statistical machine translation (SMT), which pose problems to standard parsers due to their frequent ungrammaticality. We adapt the MST parser by exploiting additional features from the source language, and by introducing artificial grammatical errors in the parser training data, so that the training sentences resemble SMT output. We evaluate the modified parser on DEPFIX, a system that improves English-Czech SMT outputs using automatic rule-based corrections of grammatical mistakes which requires parsed SMT output sentences as its input. Both parser modifications led to improvements in BLEU score; their combination was evaluated manually, showing a statistically significant improvement of the translation quality.

Research paper thumbnail of Additional German-Czech reference translations of the WMT'11 test set

Research paper thumbnail of Semi-automatic Detection of Multiword Expressions in the Slovak Dependency Treebank

Research paper thumbnail of Zum Vergleich der tschechischen und deutschen Valenzwörterbücher

Research paper thumbnail of Verbal Valency Frame Detection and Selection in Czech and English

Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 2014

Research paper thumbnail of Alex: Bootstrapping a Spoken Dialogue System for a New Domain by Real Users

Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014

Research paper thumbnail of Towards a Truly Statistical Natural Language Generator for Spoken Dialogues

Research paper thumbnail of Natural Language Generation

Research paper thumbnail of MTMonkey

Research paper thumbnail of Vystadial 2013–English data

Research paper thumbnail of Vystadial 2013–Czech data

Research paper thumbnail of Vystadial 2013–scripts

Research paper thumbnail of Robust Multilingual Statistical Morphological Generation Models

Research paper thumbnail of Czech-English Parallel Corpus 1.0 (CzEng 1.0)

Research paper thumbnail of Alex: A Statistical Dialogue Systems Framework

Lecture Notes in Computer Science, 2014

Research paper thumbnail of A Factored Discriminative Spoken Language Understanding for Spoken Dialogue Systems

Lecture Notes in Computer Science, 2014

Research paper thumbnail of MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service

The Prague Bulletin of Mathematical Linguistics, 2013

Research paper thumbnail of DEPFIX: a system for automatic correction of Czech MT outputs

ABSTRACT We present an improved version of DEPFIX (Mareček et al., 2011), a system... more ABSTRACT We present an improved version of DEPFIX (Mareček et al., 2011), a system for automatic rule-based post-processing of English-to-Czech MT outputs designed to increase their fluency. We enhanced the rule set used by the original DEPFIX system and measured the performance of the individual rules. We also modified the dependency parser of McDonald et al. (2005) in two ways to adjust it for the parsing of MT outputs. We show that our system is able to improve the quality of the state-of-the-art MT systems.

Research paper thumbnail of The Joy of Parallelism with CzEng 1.0

mt-archive.info, 2012

CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-co... more CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automatically aligned at the level of sentences as well as words. We provide not only the plain text representation, but also automatic morphological tags, surface syntactic as well as deep syntactic dependency parse trees and automatic co-reference links in both English and Czech. This paper describes key properties of the released resource including the distribution of text domains, the corpus data formats, and a toolkit to handle the provided rich annotation. We also summarize the procedure of the rich annotation (incl. co-reference resolution) and of the automatic filtering. Finally, we provide some suggestions on exploiting such an automatically annotated sentence-parallel corpus.

Research paper thumbnail of Formemes in English-Czech Deep Syntactic MT

acl.eldoc.ub.rug.nl, 2012

One of the most notable recent improvements of the TectoMT English-to-Czech translation is a syst... more One of the most notable recent improvements of the TectoMT English-to-Czech translation is a systematic and theoretically supported revision of formemes-the annotation of morpho-syntactic features of content words in deep dependency syntactic structures based on the Prague tectogrammatics theory. Our modifications aim at reducing data sparsity, increasing consistency across languages and widening the usage area of this markup. Formemes can be used not only in MT, but in various other NLP tasks.

Research paper thumbnail of Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors

SSST-6, 2012

ABSTRACT In this paper, we present two dependency parser training methods appropriate for parsing... more ABSTRACT In this paper, we present two dependency parser training methods appropriate for parsing outputs of statistical machine translation (SMT), which pose problems to standard parsers due to their frequent ungrammaticality. We adapt the MST parser by exploiting additional features from the source language, and by introducing artificial grammatical errors in the parser training data, so that the training sentences resemble SMT output. We evaluate the modified parser on DEPFIX, a system that improves English-Czech SMT outputs using automatic rule-based corrections of grammatical mistakes which requires parsed SMT output sentences as its input. Both parser modifications led to improvements in BLEU score; their combination was evaluated manually, showing a statistically significant improvement of the translation quality.

Research paper thumbnail of Additional German-Czech reference translations of the WMT'11 test set

Research paper thumbnail of Semi-automatic Detection of Multiword Expressions in the Slovak Dependency Treebank

Log In