Investigation of Transformer-based Latent Attention Models for Neural Machine Translation (original) (raw)

What does Attention in Neural Machine Translation Pay Attention to?

2017

Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.

Parallel Attention Mechanisms in Neural Machine Translation

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018

Recent papers in neural machine translation have proposed the strict use of attention mechanisms over previous standards such as recurrent and convolutional neural networks (RNNs and CNNs). We propose that by running traditionally stacked encoding branches from encoder-decoder attentionfocused architectures in parallel, that even more sequential operations can be removed from the model and thereby decrease training time. In particular, we modify the recently published attention-based architecture called Transformer by Google, by replacing sequential attention modules with parallel ones, reducing the amount of training time and substantially improving BLEU scores at the same time. Experiments over the English to German and English to French translation tasks show that our model establishes a new state of the art.

Incorporating Structural Alignment Biases into an Attentional Neural Translation Model

Neural encoder-decoder models of machine translation have achieved impressive results, rivalling traditional translation models. How- ever their modelling formulation is overly simplistic, and omits several key inductive bi- ases built into traditional models. In this paper we extend the attentional neural translation model to include structural biases from word based alignment models, including positional bias, Markov conditioning, fertility and agree- ment over translation directions. We show im- provements over a baseline attentional model and standard phrase-based model over sev- eral language pairs, evaluating on difficult lan- guages in a low resource setting.

Universal Vector Neural Machine Translation With Effective Attention

2020

Neural Machine Translation (NMT) leverages one or more trained neural networks for the translation of phrases. Sutskever introduced a sequence to sequence based encoder-decoder model which became the standard for NMT based systems. Attention mechanisms were later introduced to address the issues with the translation of long sentences and improving overall accuracy. In this paper, we propose a singular model for Neural Machine Translation based on encoder-decoder models. Most translation models are trained as one model for one translation. We introduce a neutral/universal model representation that can be used to predict more than one language depending on the source and a provided target. Secondly, we introduce an attention model by adding an overall learning vector to the multiplicative model. With these two changes, by using the novel universal model the number of models needed for multiple language translation applications are reduced.

Deep Neural Transformer Model for Mono and Multi Lingual Machine Translation

2021

In recent years, the Transformers have emerged as the most relevant deep architecture, especially machine translation. These models, which are based on attention mechanisms, outperformed previous neural machine translation architectures in several tasks. This paper proposes a new architecture based on the transformer model for the monolingual and multilingual translation system. The tests were carried out on the IWSLT 2015 and 2016 dataset. The Transformers attention mechanism increases the accuracy to more than 92% that we can quantify by more than 4 BLEU points (a performance metric used in machine translation systems).

Learning When to Attend for Neural Machine Translation

ArXiv, 2017

In the past few years, attention mechanisms have become an indispensable component of end-to-end neural machine translation models. However, previous attention models always refer to some source words when predicting a target word, which contradicts with the fact that some target words have no corresponding source words. Motivated by this observation, we propose a novel attention model that has the capability of determining when a decoder should attend to source words and when it should not. Experimental results on NIST Chinese-English translation tasks show that the new model achieves an improvement of 0.8 BLEU score over a state-of-the-art baseline.

Fine-grained attention mechanism for neural machine translation

Neurocomputing, 2018

Neural machine translation (NMT) has been a new paradigm in machine translation, and the attention mechanism has become the dominant approach with the state-of-the-art records in many language pairs. While there are variants of the attention mechanism, all of them use only temporal attention where one scalar value is assigned to one context vector corresponding to a source word. In this paper, we propose a fine-grained (or 2D) attention mechanism where each dimension of a context vector will receive a separate attention score. In experiments with the task of En-De and En-Fi translation, the fine-grained attention method improves the translation quality in terms of BLEU score. In addition, our alignment analysis reveals how the fine-grained attention mechanism exploits the internal structure of context vectors.

Attention Transformer Model for Translation of Similar Languages

2020

This paper illustrates our approach to the shared task on similar language translation in the fifth conference on machine translation (WMT-20). Our motivation comes from the latest state of the art neural machine translation in which Transformers and Recurrent Attention models are effectively used. A typical sequence-sequence architecture consists of an encoder and a decoder Recurrent Neural Network (RNN). The encoder recursively processes a source sequence and reduces it into a fixed-length vector (context), and the decoder generates a target sequence, token by token, conditioned on the same context. In contrast, the advantage of transformers is to reduce the training time by offering a higher degree of parallelism at the cost of freedom for sequential order. With the introduction of Recurrent Attention, it allows the decoder to focus effectively on order of the source sequence at different decoding steps. In our approach, we have combined the recurrence based layered encoder-decod...

An in-depth Study of Neural Machine Translation Performance

2019

With the rise of deep learning and rapidly increasing popularity of it, neural machine translation (NMT) has become one of the major research areas. Sequence-to-sequence models are widely used in NTM tasks, and one of the state-of-the-art models, the Transformer, has also encoder-decoder architecture with an additional attention mechanism. Despite a substantial amount of research in improving NMT models’ translation qualities and speeds, to the best of our knowledge, none of them gives a detailed performance analysis of each step in a model. In this paper we analyze the Transformer model’s performance and translation quality in different settings. We conclude that beam search is the bottleneck of the NMT inference and analyze beam search’s effect on the performance and quality in detail. We observe that the beam size is one of the largest contributors to the Transformer’s execution time. Additionally, we observe that the beam size only affects BLEU score at word level, and not at to...

Optimizing Transformer for Low-Resource Neural Machine Translation

ArXiv, 2020

Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation. While the Transformer model has achieved significant improvements for many language pairs and has become the de facto mainstream architecture, its capability under low-resource conditions has not been fully investigated yet. Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper-parameter settings. Our experiments show that using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.