Postech's System Description for Medical Text Translation Task (original) (raw)

Adaptation of machine translation for multilingual information retrieval in the medical domain

Artificial Intelligence in Medicine, 2014

Objective. We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR. Methods and Data. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech-English, German-English, and French-English. MT quality is evaluated on data sets created within the Khresmoi project and IR effectiveness is tested on the CLEF eHealth 2013 data sets. Results. The search query translation results achieved in our experiments are outstanding-our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech-English, from 23.03 to 40.82 for German-English, and from 32.67 to 40.82 for French-English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French-English. For Czech-English and German-English, the increased MT quality does not lead to better IR results. Conclusions. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance-better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions.

Machine Translation of Medical Texts in the Khresmoi Project

Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

This paper presents the participation of the Charles University team in the WMT 2014 Medical Translation Task. Our systems are developed within the Khresmoi project, a large integrated project aiming to deliver a multilingual multi-modal search and access system for biomedical information and documents. Being involved in the organization of the Medical Translation Task, our primary goal is to set up a baseline for both its subtasks (summary translation and query translation) and for all translation directions. Our systems are based on the phrasebased Moses system and standard methods for domain adaptation. The constrained/unconstrained systems differ in the training data only.

DCU Terminology Translation System for Medical Query Subtask at WMT14

Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

This paper describes the Dublin City University terminology translation system used for our participation in the query translation subtask in the medical translation task in the Workshop on Statistical Machine Translation (WMT14). We deployed six different kinds of terminology extraction methods, and participated in three different tasks: FR-EN and EN-FR query tasks, and the CLIR task. We obtained 36.2 BLEU points absolute for FR-EN and 28.8 BLEU points absolute for EN-FR tasks where we obtained the first place in both tasks. We obtained 51.8 BLEU points absolute for the CLIR task.

Experiments in Medical Translation Shared Task at WMT 2014

Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

This paper describes Dublin City University's (DCU) submission to the WMT 2014 Medical Summary task. We report our results on the test data set in the French to English translation direction. We also report statistics collected from the corpora used to train our translation system. We conducted our experiment on the Moses 1.0 phrase-based translation system framework. We performed a variety of experiments on translation models, reordering models, operation sequence model and language model. We also experimented with data selection and removal the length constraint for phrase-pair extraction.

Improving statistical machine translation in the medical domain using the Unified Medical Language System

… of the 20th international conference on …, 2004

Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database (the Unified Medical Language System) for the translation of dialogues between doctors and patients using a statistical machine translation system. We are able to show that the extraction of a large dictionary and the usage of semantic type information to generalize the training data significantly improves the translation performance.

IXA Biomedical Translation System at WMT16 Biomedical Translation Task

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, 2016

In this paper we present the system developed at the IXA NLP Group of the University of the Basque Country for the Biomedical Translation Task in the First Conference on Machine Translation (WMT16). For the adaptation of a statistical machine translation system to the biomedical domain, we developed three approaches based on a baseline system for English-Spanish and Spanish-English language pairs. The lack of terminology and the variation of the prominent sense of the words are the issues we have addressed on these approaches. The best of our systems reached the average of all the systems submitted in the challenge in most of the evaluation sets.

Polish-English Statistical Machine Translation of Medical Texts

Advances in Intelligent Systems and Computing, 2015

This new research explores the effects of various training methods on a Polish to English Statistical Machine Translation system for medical texts. Various elements of the EMEA parallel text corpora from the OPUS project were used as the basis for training of phrase tables and language models and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR, RIBES and TER metrics have been used to evaluate the effects of various system and data preparations on translation results. Our experiments included systems that used POS tagging, factored phrase models, hierarchical models, syntactic taggers, and many different alignment methods. We also conducted a deep analysis of Polish data as preparatory work for automatic data correction such as true casing and punctuation normalization phase.

Looking for the best Evaluation Method for Interlingua- based Spoken Language Translation in the medical Domain

This paper focuses on the quality of rule-based machine translations collected using our open-source limited-domain medical spoken language trans-lator (SLT) tested at the Dallas Children's hospital. Our aim is to find the best suited metrics for our Interlingua rule based machine translation (RBMT) sys-tem. We applied both human metrics and a set of well known automatic metrics (BLEU, WER and TER) to a corpus of translations produced by our system during a controlled experiment. We also compared the scores obtained for both type of evaluation with those obtained on translations produced by the well known statistical machine translation (SMT) system GoogleTranslate 1 in order to have a point of comparison. Our aim is to find the best suited metric for our type of Interlingua RBMT SLT system.