Michael Ustaszewski | University of Innsbruck (original) (raw)

Papers by Michael Ustaszewski

Research paper thumbnail of TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies

Proceedings of the First Workshop on Human-Informed Translation and Interpreting Technology (Hit-IT), 7 September 2017, Varna, Bulgaria, 2017

Despite the growing importance of data in translation, there is no data repository that equally m... more Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike. Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP.

Research paper thumbnail of Optimising the Europarl corpus for translation studies with the EuroparlExtract toolkit

Perspectives - Studies in Translation Theory and Practice

The freely available European Parliament Proceedings Parallel Corpus, or Europarl, is one of the ... more The freely available European Parliament Proceedings Parallel Corpus, or Europarl, is one of the largest multilingual corpora available to date. Surprisingly, bibliometric analyses show that it has hardly been used in translation studies. Its low impact in translation studies may partly be attributed to the fact that the Europarl corpus is distributed in a format that largely disregards the needs of translation research. In order to make the wealth of linguistic data from Europarl easily and readily available to the translation studies community, the toolkit 'EuroparlExtract' has been developed. With the toolkit, comparable and parallel corpora tailored to the requirements of translation research can be extracted from Europarl on demand. Both the toolkit and the extracted corpora are distributed under open licenses. The free availability is to avoid the duplication of effort in corpus-based translation studies and to ensure the sustainability of data reuse. Thus, EuroparlExtract is a contribution to satisfy the growing demand for translation-oriented corpora.

Research paper thumbnail of An der Schnittstelle von Translations- und Interkomprehensionsdidaktik: Ergebnisse einer Fallstudie zur slawischen Interkomprehension

Research paper thumbnail of Bausteine translatorischer Kompetenz oder Was macht Übersetzer und Dolmetscher zu Profis?

Research paper thumbnail of Deep-syntax TectoMT for English-Spanish MT

Deep-syntax approaches to machine translation have emerged as an alternative to phrase-based stat... more Deep-syntax approaches to machine translation have emerged as an alternative to phrase-based statistical systems, which seem to lack the capacity to address essential linguistic phenomena for translation. As an alternative, TectoMT is an open source framework for transfer-based MT which works at the deep tectogrammatical level and combines linguistic knowledge and statistical techniques. This work describes the development of machine translation systems for English-
Spanish in both directions, leveraging on the modules for the English-Czech TectoMT system.
We show that it is feasible to develop basic systems with relatively low effort in 9 months. Our evaluation shows that despite not yet being able to beat a phrase-based statistical system, the TectoMT architecture offers flexible customization options, which considerably increase the BLEU scores.

Research paper thumbnail of Towards a Methodology for Intercomprehension-Based Language Instruction in Translator Training

The ever-changing translation market requires translation degree programmes to prepare students f... more The ever-changing translation market requires translation degree programmes to prepare students for a flexible reaction to unpredictable changes in demand for language services. A maximum of demand-oriented flexibility entails translators' capability to add new working languages in a timely manner as need arises.
Against this background, this thesis proposes a methodology for language instruction in translator training based on intercomprehension, which refers to the innate ability to (partially) comprehend unfamiliar languages without having them acquired previously. Research into third or additional language acquisition has repeatedly shown that intercomprehension may serve as the point of departure for the time-saving acquisition of receptive skills across closely related languages. In application to translator training, intercomprehension-based language instruction yields great potential for the design of modular elective language courses that enable would-be translators to efficiently exploit their existing linguistic knowledge and skills for the addition of passive working languages.
Centring around the assumption that translation competence involves a series of metalinguistic skills that are transferable to any language combination, the theoretical part of this thesis elaborates on the cognitive foundations of the integration of intercomprehension-based language instruction with translator training and presents the methodology with reference to two courses of this type held within the MA programme in Translation Studies at the University of Innsbruck. In the empirical part, data from these two courses are provided in support of the feasibility of this didactic approach. Furthermore, an exploratory analysis of intercomprehension-based translation processes is presented.

Research paper thumbnail of TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies

Proceedings of the First Workshop on Human-Informed Translation and Interpreting Technology (Hit-IT), 7 September 2017, Varna, Bulgaria, 2017

Despite the growing importance of data in translation, there is no data repository that equally m... more Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike. Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP.

Research paper thumbnail of Optimising the Europarl corpus for translation studies with the EuroparlExtract toolkit

Perspectives - Studies in Translation Theory and Practice

The freely available European Parliament Proceedings Parallel Corpus, or Europarl, is one of the ... more The freely available European Parliament Proceedings Parallel Corpus, or Europarl, is one of the largest multilingual corpora available to date. Surprisingly, bibliometric analyses show that it has hardly been used in translation studies. Its low impact in translation studies may partly be attributed to the fact that the Europarl corpus is distributed in a format that largely disregards the needs of translation research. In order to make the wealth of linguistic data from Europarl easily and readily available to the translation studies community, the toolkit 'EuroparlExtract' has been developed. With the toolkit, comparable and parallel corpora tailored to the requirements of translation research can be extracted from Europarl on demand. Both the toolkit and the extracted corpora are distributed under open licenses. The free availability is to avoid the duplication of effort in corpus-based translation studies and to ensure the sustainability of data reuse. Thus, EuroparlExtract is a contribution to satisfy the growing demand for translation-oriented corpora.

Research paper thumbnail of An der Schnittstelle von Translations- und Interkomprehensionsdidaktik: Ergebnisse einer Fallstudie zur slawischen Interkomprehension

Research paper thumbnail of Bausteine translatorischer Kompetenz oder Was macht Übersetzer und Dolmetscher zu Profis?

Research paper thumbnail of Deep-syntax TectoMT for English-Spanish MT

Deep-syntax approaches to machine translation have emerged as an alternative to phrase-based stat... more Deep-syntax approaches to machine translation have emerged as an alternative to phrase-based statistical systems, which seem to lack the capacity to address essential linguistic phenomena for translation. As an alternative, TectoMT is an open source framework for transfer-based MT which works at the deep tectogrammatical level and combines linguistic knowledge and statistical techniques. This work describes the development of machine translation systems for English-
Spanish in both directions, leveraging on the modules for the English-Czech TectoMT system.
We show that it is feasible to develop basic systems with relatively low effort in 9 months. Our evaluation shows that despite not yet being able to beat a phrase-based statistical system, the TectoMT architecture offers flexible customization options, which considerably increase the BLEU scores.

Research paper thumbnail of Towards a Methodology for Intercomprehension-Based Language Instruction in Translator Training

The ever-changing translation market requires translation degree programmes to prepare students f... more The ever-changing translation market requires translation degree programmes to prepare students for a flexible reaction to unpredictable changes in demand for language services. A maximum of demand-oriented flexibility entails translators' capability to add new working languages in a timely manner as need arises.
Against this background, this thesis proposes a methodology for language instruction in translator training based on intercomprehension, which refers to the innate ability to (partially) comprehend unfamiliar languages without having them acquired previously. Research into third or additional language acquisition has repeatedly shown that intercomprehension may serve as the point of departure for the time-saving acquisition of receptive skills across closely related languages. In application to translator training, intercomprehension-based language instruction yields great potential for the design of modular elective language courses that enable would-be translators to efficiently exploit their existing linguistic knowledge and skills for the addition of passive working languages.
Centring around the assumption that translation competence involves a series of metalinguistic skills that are transferable to any language combination, the theoretical part of this thesis elaborates on the cognitive foundations of the integration of intercomprehension-based language instruction with translator training and presents the methodology with reference to two courses of this type held within the MA programme in Translation Studies at the University of Innsbruck. In the empirical part, data from these two courses are provided in support of the feasibility of this didactic approach. Furthermore, an exploratory analysis of intercomprehension-based translation processes is presented.