Optimising the Europarl corpus for translation studies with the EuroparlExtract toolkit (original) (raw)

Customization of the Europarl Corpus for Translation Studies

2012

Currently, the area of translation studies lacks corpora by which translation scholars can validate their theoretical claims, for example, regarding the scope of the characteristics of the translation relation. In this paper, we describe a customized resource in the area of translation studies that mainly addresses research on the properties of the translation relation. Our experimental results show that the Type-Token-Ratio (TTR) is not a universally valid indicator of the simplification of translation.

Using the Europarl corpus for cross-linguistic research

Belgian Journal of Linguistics

Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. is article presents a method to extract di erent corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, we show that these corpora are very similar. In addition, we argue that they present many advantages for research in various elds of linguistics and translation studies, and we also discuss some of their limitations. We conclude by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities o ered by Europarl.

Review of Contribution of translation to the multilingual society in the EU

Target-international Journal of Translation Studies, 2014

The European Commission’s Directorate-General for Translation (DGT) wants to do Translation Studies, and we must welcome the intention. Part of its plan involves the production of a series of reports on key issues in the field. This makes sense because Europe massively depends on translation for its governance, and thus invests significant administrative resources in translation services. If translation is important anywhere, it should be in Europe. If it should be studied anywhere, it is in Europe.

Compiling and using a parallel corpus for research in translation

2009

There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. The advent of corpus linguistics, however, has made it possible to analyse enormous quantities of translated text in unprecedented ways. In line with these advances, parallel corpora can provide access to many aspects of translation that had previously not been possible to study in a systematic way. The first part of this paper discusses different types of decisions that have to be made when building a parallel corpus, with particular emphasis to compilation questions that are unique to parallel corpora as opposed to corpora in general. This is followed by an account of the choices made when creating COMPARA -a post-edited, bi-directional parallel corpus of English and Portuguese literary texts with 3 million words, freely available for research and education at http://www.linguateca.pt/COMPARA/. Finally, examples of how this parallel corpus can be (and has been) used in translation research are presented.

A Hybrid Translation Theory for EU Texts

Vertimo Studijos 5, 2013

EU texts are produced by way of multilingual negotiation in a supranational multicultural discourse community, where there is no linguistically neutral ground and where internationalisation of concepts and ideas is a sine qua non. As a result, they are idiosyncratic texts, reflecting specific textual features. Their translation in the current 23 official EU languages is equally idiosyncratic and challenging, to say the least, especially since it is shaped under the EU’s overwhelming cultural and linguistic diversity, the constraints of its policy of multilingualism, and the subsequent policy of linguistic equality which states that all languages are equal, or “equally authentic” (Wagner, Bech and Martinez 2002, 7), and that translations are not really translations but language versions. In other words, in the framework of EU translation, the terms source text (ST) and target text (TT) cease to exist, while the prima facie illusory notion of “equivalence” seems to resurface –though altered in nature– and dominate the translation practice. It thus goes without saying that in the case of EU texts and their translation a tailor-made theoretical framework is required where many classic concepts of Translation Studies (TS), such as ST, TT and equivalence need to be re-evaluated and redefined, and at the same time functionalist approaches and the postmodernist concepts of intertextuality, hybridity and in-betweenness need to come to the fore. The proposed translation theory for EU texts flaunts the feature inherent in their production, it is – just like them – hybrid.

Using a Parallel Corpus in Translation Practice and Research

2006

There are so many variables underlying translation that examining anything longer than a few paragraphs of translated text at a time can become quite a daunting task. Using the technology of corpus linguistics, however, it is possible to analyse enormous quantities of translated text in unprecedented ways. A parallel language corpus, i.e., a computerized collection of texts in one language aligned with their translations into another language, can provide automatic access to countless features of translated texts that up to now have not been possible to study in a systematic way. COMPARA, a translation tool developed by Linguateca 1 , is the largest public, edited online parallel corpus of English and Portuguese in the world. In its current version 7.04, it provides access to almost three million words of original and translated fiction published in Portuguese and English. The aim of this presentation is to offer a brief description of the corpus and to demonstrate how it can be used in translation practice and research.

Alignment-based profiling of Europarl data in an English-Swedish parallel corpus

This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the same parallel corpus. We first describe our method for comparison which is based on alignments, both at the token level and the structural level. Although two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion of many types of restructurings, including additions, deletions and long distance reorderings. We explain this by the fact that the majority of Europarl segments are parallel translations.

Zooming Into the Language of Eu Documents

2023

Translation is a process which has been in a continuous evolution and change of paradigm ever since the Tower of Babel and up to Google translate means. It has made its steps forward in accordance with the evolution of society and culture. Nowadays, European institutions are producing multiple documents: legislation, political speeches, declarations, directives, administration forms. This paper is zooming into the progress of translation and its present state in European documents texts, how challenging or more trouble-free than other types of translation it is and the difficulties encountered by the translator in the process of translating from the source language into the target language.

Romanian Translational Corpora: Building Comparable Corpora for Translation Studies

Building comparable corpora for the investigation of translational hypotheses is an important task within the translation studies domain. This paper describes the compilation of a translational comparable corpus for the Romanian language. The resource comprises translated and non-translated news articles and it is designed to be used in the investigation of translational language and translational hypotheses.

Experience from Translation of EU Documents

mt-archive.info

There are three main actors in the ideal translation workflow: the translator, the terminologist and the reviser. We describe all the three roles in terms of their input and output information in the translation workflow and developed a language technology toolkit to help them to ...