Ekaterina Lapshinova-Koltunski | Saarland University (original) (raw)

Uploads

Papers by Ekaterina Lapshinova-Koltunski

Research paper thumbnail of Cohesion and coherence in multilingual contexts

Cohesion and coherence in multilingual contexts

Research paper thumbnail of Lexical cohesion

Routledge eBooks, May 27, 2022

Research paper thumbnail of Cognitive Aspects of Compound Translation: Insights into the Relation between Implicitation and Semantic Complexity from a Translation Process Perspective

Social Science Research Network, 2023

Research paper thumbnail of Analysing Coreference in Transformer Outputs

arXiv (Cornell University), Nov 4, 2019

We analyse coreference phenomena in three neural machine translation systems trained with differe... more We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra-and cross-sentential anaphoric information. We compare system performance on two different genres: news and TED talks. To do this, we manually annotate (the possibly incorrect) coreference chains in the MT outputs and evaluate the coreference chain translations. We define an error typology that aims to go further than pronoun translation adequacy and includes types such as incorrect word selection or missing words. The features of coreference chains in automatic translations are also compared to those of the source texts and human translations. The analysis shows stronger potential translationese effects in machine translated outputs than in human translations.

Research paper thumbnail of Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English-German Translation

In this paper, we analyse alignment discrepancies for discourse structures in English-German para... more In this paper, we analyse alignment discrepancies for discourse structures in English-German parallel data-sentence pairs, in which discourse structures in target or source texts have no alignment in the corresponding parallel sentences. The discourse-related structures are designed in form of linguistic patterns based on the information delivered by automatic part-of-speech and dependency annotation. In addition to alignment errors (existing structures left unaligned), these alignment discrepancies can be caused by language contrasts or through the phenomena of explicitation and implicitation in the translation process. We propose a new approach including new type of resources for corpus-based language contrast analysis and apply it to study and classify the contrasts found in our English-German parallel corpus. As unaligned discourse structures may also result in the loss of discourse information in the MT training data, we hope to deliver information in support of discourse-aware machine translation (MT).

Research paper thumbnail of 9 English-German contrasts in cohesion and implications for translation

Empirical Translation Studies, 2017

Research paper thumbnail of Exploring Linguistic Differences Between Novice and Professional Translators With Text Classification Methods

Research paper thumbnail of Corpus based linguistics and translatology Corpus based translation studies

Research paper thumbnail of 7 Exploratory analysis of dimensions influencing variation in translation. The case of text register and translation method

Empirical Translation Studies, 2017

Research paper thumbnail of Across Languages and Genres: Creating a Universal Annotation Scheme for Textual Relations

Proceedings of The 9th Linguistic Annotation Workshop, 2015

The present paper describes an attempt to create an interoperable scheme using existing annotatio... more The present paper describes an attempt to create an interoperable scheme using existing annotations of textual phenomena across languages and genres including non-canonical ones. Such a kind of analysis requires annotated multilingual resources which are costly. Therefore, we make use of annotations already available in the resources for English, German and Czech. As the annotations in these corpora are based on different conceptual and methodological backgrounds, we need an interoperable scheme that covers existing categories and at the same time allows a comparison of the resources. In this paper, we describe how this interoperable scheme was created and which problematic cases we had to consider. The resulting scheme is supposed to be applied in the future to explore contrasts between the three languages under analysis, for which we expect the greatest differences in the degree of variation between non-canonical and canonical language.

Research paper thumbnail of Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora}

@Book{AEPC:2011, editor = {Kiril Simov and Petya Osenova and Jörg Tiedemann and Radovan Garabik},... more @Book{AEPC:2011, editor = {Kiril Simov and Petya Osenova and Jörg Tiedemann and Radovan Garabik}, title = {Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora}, month = {September}, year = {2011}, address = {Hissar, Bulgaria}, url ...

Research paper thumbnail of German clause-embedding predicates : an extraction and classification approach

This thesis describes a semi-automatic approach to the analysis of subcategorisation properties o... more This thesis describes a semi-automatic approach to the analysis of subcategorisation properties of verbal, nominal and multiword predicates in German. We semi-automatically classify predicates according to their subcategorisation properties by means of extracting them from German corpora along with their complements. In this work, we concentrate exclusively on sentential complements, such as dass, ob and w-clauses, although our methods can be also applied for other complement types. Our aim is not only to extract and classify predicates but also to compare subcategorisation properties of morphologically related predicates, such as verbs and their nominalisations. It is usually assumed that subcategorisation properties of nominalisations are taken over from their underlying verbs. However, our tests show that there exist different types of relations between them. Thus, we review subcategorisation properties of morphologically related words and analyse their correspondences and differ...

Research paper thumbnail of Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora

Prague Bulletin of Mathematical Linguistics

Research paper thumbnail of The Routledge Handbook of Translation and Pragmatics, edited by Rebecca Tipton and Louisa Desilla

Contrastive Pragmatics, 2021

Research paper thumbnail of Detecting normalization and shining-through in novice and professional translations

Extending the Scope of Corpus-Based Translation Studies, 2022

Research paper thumbnail of Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces

Proceedings of the 2nd Workshop on Computational Approaches to Discourse, 2021

In the present paper, we explore lexical contexts of discourse markers in translation and interpr... more In the present paper, we explore lexical contexts of discourse markers in translation and interpreting on the basis of word embeddings. Our special interest is on contextual variation of the same discourse markers in (written) translation vs. (simultaneous) interpreting. To explore this variation at the lexical level, we use a data-driven approach: we compare bilingual neural word embeddings trained on source-to-translation and source-tointerpreting aligned corpora. Our results show more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting. We also observe different trends with regard to the discourse relation types.

Research paper thumbnail of Kontrastive Analyse deutscher und englischer Kohäsionsmittel in verschiedenen Diskurstypen

In diesem Artikel geben wir einen Überblick über unsere Forschungsarbeiten im Zusammenhang mit de... more In diesem Artikel geben wir einen Überblick über unsere Forschungsarbeiten im Zusammenhang mit der Thematik der Diskursrelationen im Sprachund Registervergleich und dem korpuslinguistischen Forschungsprojekt „German-English contrasts in cohesion – Towards an empirically-based comparison (GECCo)“. Die Arbeiten sollen zu einer Analyse und Erklärung von Textkohäsion im Englischen und Deutschen beitragen. Beispiele für sprachvergleichende korpusbasierte Ansätze zu Fragen der englischen und deutschen Textkohäsion sind bislang selten und behandeln hauptsächlich einzelne Kohäsionsmittel, aber nicht Textualität und Diskursorganisation in einem größeren Zusammenhang. In diesem Artikel wird speziell auf die korpusbasierte Untersuchung von endophorischen Ellipsen als beispielhaft gewählte Fallstudie im Kontext des GECCo-Projektes eingegangen, wobei im Projekt die gesamte Bandbreite von Kohäsionsmitteln abgedeckt wird. Es wird gezeigt, wie sich die Subtypen von Ellipsen in englischen und deutsc...

Research paper thumbnail of A Coreference-Annotated Corpus for Machine Translation Research

The annotator selection process also took place during the first months. We were able to recruit ... more The annotator selection process also took place during the first months. We were able to recruit an experienced annotator for our project. This annotator had been involved in the annotation of coreference and other discourse-related structures in the project GECCo, which was advantageous for us. At the same time, since this person already had a Master’s degree in Translation and Interpreting and was therefore more expensive as an annotator, we reduced her working hours. However, since we saved time that we would have needed for training a less experienced annotator and this annotator works very fast, we achieved the planned aims at lower than budgeted cost despite the higher remuneration.

Research paper thumbnail of Polarity in Translation: Differences between Novice and Experts across Registers

Translation can obscure the subjectivity of the sources and flatten down positive and negative as... more Translation can obscure the subjectivity of the sources and flatten down positive and negative aspects. Thus, we perform an explorative analysis of translation in terms of sentiment properties focusing on the differences between student and professional translations of various registers. However, we do not compare translations with their sources, but analyse polarity items in two translation variants from the same text sources. We propose a multi-step analysis to investigate the distribution of polarity items and report on small experiments on a corpus of English to German translations to identify the lack of experience in translation by students. Our results show that pragmatic differences expressed in the usage of polarity words is highly dependent on the register a text belongs to. Following this, we identify registers, such as popularscientific articles, where students translate sentiment using more and heavier polarity words.

Research paper thumbnail of Measuring Translationese across Levels of Expertise: Are Professionals more Surprising than Students?

The present paper deals with a computational analysis of translationese in professional and stude... more The present paper deals with a computational analysis of translationese in professional and student English-to-German translations belonging to different registers. Building upon an information-theoretical approach, we test translation conformity to source and target language in terms of a neural language model’s perplexity over Part of Speech (PoS) sequences. Our primary focus is on register diversification vs. convergence, reflected in the use of constructions eliciting a higher vs. lower perplexity score. Our results show that, against our expectations, professional translations elicit higher perplexity scores from a target language model than students’ translations. An analysis of the distribution of PoS patterns across registers shows that this apparent paradox is the effect of higher stylistic diversification and register sensitivity in professional translations. Our results contribute to the understanding of human translationese and shed light on the variation in texts genera...

Research paper thumbnail of Cohesion and coherence in multilingual contexts

Cohesion and coherence in multilingual contexts

Research paper thumbnail of Lexical cohesion

Routledge eBooks, May 27, 2022

Research paper thumbnail of Cognitive Aspects of Compound Translation: Insights into the Relation between Implicitation and Semantic Complexity from a Translation Process Perspective

Social Science Research Network, 2023

Research paper thumbnail of Analysing Coreference in Transformer Outputs

arXiv (Cornell University), Nov 4, 2019

We analyse coreference phenomena in three neural machine translation systems trained with differe... more We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra-and cross-sentential anaphoric information. We compare system performance on two different genres: news and TED talks. To do this, we manually annotate (the possibly incorrect) coreference chains in the MT outputs and evaluate the coreference chain translations. We define an error typology that aims to go further than pronoun translation adequacy and includes types such as incorrect word selection or missing words. The features of coreference chains in automatic translations are also compared to those of the source texts and human translations. The analysis shows stronger potential translationese effects in machine translated outputs than in human translations.

Research paper thumbnail of Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English-German Translation

In this paper, we analyse alignment discrepancies for discourse structures in English-German para... more In this paper, we analyse alignment discrepancies for discourse structures in English-German parallel data-sentence pairs, in which discourse structures in target or source texts have no alignment in the corresponding parallel sentences. The discourse-related structures are designed in form of linguistic patterns based on the information delivered by automatic part-of-speech and dependency annotation. In addition to alignment errors (existing structures left unaligned), these alignment discrepancies can be caused by language contrasts or through the phenomena of explicitation and implicitation in the translation process. We propose a new approach including new type of resources for corpus-based language contrast analysis and apply it to study and classify the contrasts found in our English-German parallel corpus. As unaligned discourse structures may also result in the loss of discourse information in the MT training data, we hope to deliver information in support of discourse-aware machine translation (MT).

Research paper thumbnail of 9 English-German contrasts in cohesion and implications for translation

Empirical Translation Studies, 2017

Research paper thumbnail of Exploring Linguistic Differences Between Novice and Professional Translators With Text Classification Methods

Research paper thumbnail of Corpus based linguistics and translatology Corpus based translation studies

Research paper thumbnail of 7 Exploratory analysis of dimensions influencing variation in translation. The case of text register and translation method

Empirical Translation Studies, 2017

Research paper thumbnail of Across Languages and Genres: Creating a Universal Annotation Scheme for Textual Relations

Proceedings of The 9th Linguistic Annotation Workshop, 2015

The present paper describes an attempt to create an interoperable scheme using existing annotatio... more The present paper describes an attempt to create an interoperable scheme using existing annotations of textual phenomena across languages and genres including non-canonical ones. Such a kind of analysis requires annotated multilingual resources which are costly. Therefore, we make use of annotations already available in the resources for English, German and Czech. As the annotations in these corpora are based on different conceptual and methodological backgrounds, we need an interoperable scheme that covers existing categories and at the same time allows a comparison of the resources. In this paper, we describe how this interoperable scheme was created and which problematic cases we had to consider. The resulting scheme is supposed to be applied in the future to explore contrasts between the three languages under analysis, for which we expect the greatest differences in the degree of variation between non-canonical and canonical language.

Research paper thumbnail of Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora}

@Book{AEPC:2011, editor = {Kiril Simov and Petya Osenova and Jörg Tiedemann and Radovan Garabik},... more @Book{AEPC:2011, editor = {Kiril Simov and Petya Osenova and Jörg Tiedemann and Radovan Garabik}, title = {Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora}, month = {September}, year = {2011}, address = {Hissar, Bulgaria}, url ...

Research paper thumbnail of German clause-embedding predicates : an extraction and classification approach

This thesis describes a semi-automatic approach to the analysis of subcategorisation properties o... more This thesis describes a semi-automatic approach to the analysis of subcategorisation properties of verbal, nominal and multiword predicates in German. We semi-automatically classify predicates according to their subcategorisation properties by means of extracting them from German corpora along with their complements. In this work, we concentrate exclusively on sentential complements, such as dass, ob and w-clauses, although our methods can be also applied for other complement types. Our aim is not only to extract and classify predicates but also to compare subcategorisation properties of morphologically related predicates, such as verbs and their nominalisations. It is usually assumed that subcategorisation properties of nominalisations are taken over from their underlying verbs. However, our tests show that there exist different types of relations between them. Thus, we review subcategorisation properties of morphologically related words and analyse their correspondences and differ...

Research paper thumbnail of Exploring Explicitation and Implicitation in Parallel Interpreting and Translation Corpora

Prague Bulletin of Mathematical Linguistics

Research paper thumbnail of The Routledge Handbook of Translation and Pragmatics, edited by Rebecca Tipton and Louisa Desilla

Contrastive Pragmatics, 2021

Research paper thumbnail of Detecting normalization and shining-through in novice and professional translations

Extending the Scope of Corpus-Based Translation Studies, 2022

Research paper thumbnail of Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces

Proceedings of the 2nd Workshop on Computational Approaches to Discourse, 2021

In the present paper, we explore lexical contexts of discourse markers in translation and interpr... more In the present paper, we explore lexical contexts of discourse markers in translation and interpreting on the basis of word embeddings. Our special interest is on contextual variation of the same discourse markers in (written) translation vs. (simultaneous) interpreting. To explore this variation at the lexical level, we use a data-driven approach: we compare bilingual neural word embeddings trained on source-to-translation and source-tointerpreting aligned corpora. Our results show more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting. We also observe different trends with regard to the discourse relation types.

Research paper thumbnail of Kontrastive Analyse deutscher und englischer Kohäsionsmittel in verschiedenen Diskurstypen

In diesem Artikel geben wir einen Überblick über unsere Forschungsarbeiten im Zusammenhang mit de... more In diesem Artikel geben wir einen Überblick über unsere Forschungsarbeiten im Zusammenhang mit der Thematik der Diskursrelationen im Sprachund Registervergleich und dem korpuslinguistischen Forschungsprojekt „German-English contrasts in cohesion – Towards an empirically-based comparison (GECCo)“. Die Arbeiten sollen zu einer Analyse und Erklärung von Textkohäsion im Englischen und Deutschen beitragen. Beispiele für sprachvergleichende korpusbasierte Ansätze zu Fragen der englischen und deutschen Textkohäsion sind bislang selten und behandeln hauptsächlich einzelne Kohäsionsmittel, aber nicht Textualität und Diskursorganisation in einem größeren Zusammenhang. In diesem Artikel wird speziell auf die korpusbasierte Untersuchung von endophorischen Ellipsen als beispielhaft gewählte Fallstudie im Kontext des GECCo-Projektes eingegangen, wobei im Projekt die gesamte Bandbreite von Kohäsionsmitteln abgedeckt wird. Es wird gezeigt, wie sich die Subtypen von Ellipsen in englischen und deutsc...

Research paper thumbnail of A Coreference-Annotated Corpus for Machine Translation Research

The annotator selection process also took place during the first months. We were able to recruit ... more The annotator selection process also took place during the first months. We were able to recruit an experienced annotator for our project. This annotator had been involved in the annotation of coreference and other discourse-related structures in the project GECCo, which was advantageous for us. At the same time, since this person already had a Master’s degree in Translation and Interpreting and was therefore more expensive as an annotator, we reduced her working hours. However, since we saved time that we would have needed for training a less experienced annotator and this annotator works very fast, we achieved the planned aims at lower than budgeted cost despite the higher remuneration.

Research paper thumbnail of Polarity in Translation: Differences between Novice and Experts across Registers

Translation can obscure the subjectivity of the sources and flatten down positive and negative as... more Translation can obscure the subjectivity of the sources and flatten down positive and negative aspects. Thus, we perform an explorative analysis of translation in terms of sentiment properties focusing on the differences between student and professional translations of various registers. However, we do not compare translations with their sources, but analyse polarity items in two translation variants from the same text sources. We propose a multi-step analysis to investigate the distribution of polarity items and report on small experiments on a corpus of English to German translations to identify the lack of experience in translation by students. Our results show that pragmatic differences expressed in the usage of polarity words is highly dependent on the register a text belongs to. Following this, we identify registers, such as popularscientific articles, where students translate sentiment using more and heavier polarity words.

Research paper thumbnail of Measuring Translationese across Levels of Expertise: Are Professionals more Surprising than Students?

The present paper deals with a computational analysis of translationese in professional and stude... more The present paper deals with a computational analysis of translationese in professional and student English-to-German translations belonging to different registers. Building upon an information-theoretical approach, we test translation conformity to source and target language in terms of a neural language model’s perplexity over Part of Speech (PoS) sequences. Our primary focus is on register diversification vs. convergence, reflected in the use of constructions eliciting a higher vs. lower perplexity score. Our results show that, against our expectations, professional translations elicit higher perplexity scores from a target language model than students’ translations. An analysis of the distribution of PoS patterns across registers shows that this apparent paradox is the effect of higher stylistic diversification and register sensitivity in professional translations. Our results contribute to the understanding of human translationese and shed light on the variation in texts genera...