AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages (original) (raw)

A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

arXiv (Cornell University), 2022

Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages. To disengage from these dependencies, researchers have explored training multilingual models on English-only resources and transferring them to low-resource languages. However, its effect is limited by the gap between embedding clusters of different languages. To address this issue, we propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss, thereby improving cross-lingual transferability. Experimental results on mBERT and XLM-R demonstrate that our method significantly outperforms previous works on the zero-shot crosslingual text classification task and can obtain a better multilingual alignment.

Few-shot Learning with Multilingual Language Models

arXiv (Cornell University), 2021

Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent multiple languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study their few-and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 counterparts on 171 out of 182 directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We conduct an in-depth analysis of different multilingual prompting approaches, showing in particular that strong incontext few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples. 1

Zero-Shot Transfer Learning using Affix and Correlated Cross-Lingual Embeddings

Authorea (Authorea), 2023

Learning morphologically supplemented embedding spaces using cross-lingual models has become an active area of research and facilitated many research breakthroughs in various applications such as machine translation, named entity recognition, document classification, and natural language inference. However, the field has not become customary for Southern African low-resourced languages. In this paper, we present, evaluate and benchmark a cohort of cross-lingual embeddings for the English-Southern African languages on two classification tasks: News Headlines Classification (NHC) and Named Entity Recognition (NER). Our methodology considers four agglutinative languages from the eleven official South African languages: Isixhosa, Sepedi, Sesotho, and Setswana. Canonical correlation analyses and VecMap are the two cross-lingual alignment strategies adopted for this study. Monolingual embeddings used in this work are Glove (source), and FastText (source and target) embeddings. Our results indicate that with enough comparable corpora, we can develop strong inter-joined representations between English and the considered Southern African languages. More specifically, the best zero-shot transfer results on the available Setswana NHC dataset were achieved using canonically correlated embeddings with Multi-layered perceptron as the training model (54.5% accuracy). Furthermore, our NER best performance was achieved using canonically correlated cross-lingual embeddings with Conditional Random Fields as the training model (96.4% F1 score). Collectively, this study's results were competitive with the benchmarks of the explored NHC and NER datasets, on both zero-short NHC and NER tasks with our advantage being the use of very minimal resources.

Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks

2021

In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training. A source of cross-lingual transfer can be as straightforward as lexical overlap between languages (e.g., use of the same scripts, shared subwords) that naturally forces text embeddings to occupy a similar representation space. Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks as the most important factor for the transfer. In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining. Particularly, we take XLM-RoBERTa (XLM-R) in our experiments that extend semantic textual similarity (STS), SQuAD and KorQuAD for machine reading comprehension, sentiment analysis, and alignment of sentence embeddings under various cross-lingual settings. Our results indicate that the presence of...

mGPT: Few-Shot Learners Go Multilingual

Cornell University - arXiv, 2022

Recent studies report that autoregressive language models can successfully solve many NLP tasks via zero-and few-shot learning paradigms, which opens up new possibilities for using the pre-trained language models. This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism; Deepspeed and Megatron frameworks allow us to parallelize the training and inference steps effectively. The resulting models show performance on par with the recently released XGLM models by Facebook, covering more languages and enhancing NLP possibilities for low resource languages of CIS countries and Russian small nations. We detail the motivation for the choices of the architecture design, thoroughly describe the data preparation pipeline, and train five small versions of the model to choose the most optimal multilingual tokenization strategy. We measure the model perplexity in all covered languages, and evaluate it on the wide spectre of multilingual tasks, including classification, generative, sequence labeling and knowledge probing. The models were evaluated with the zero-shot and few-shot methods. Furthermore, we compared the classification tasks with the state-of-the-art multilingual model XGLM. The source code and the mGPT XL model are publicly released. 6 https://tensorflow.org/datasets/ catalog/c4 7 We used the 20201101 dump version for each language.

Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages

arXiv (Cornell University), 2023

We address the task of machine translation (MT) from extremely low-resource language (ELRL) to English by leveraging cross-lingual transfer from closely-related high-resource language (HRL). The development of an MT system for ELRL is challenging because these languages typically lack parallel corpora and monolingual corpora, and their representations are absent from large multilingual language models. Many ELRLs share lexical similarities with some HRLs, which presents a novel modeling opportunity. However, existing subword-based neural MT models do not explicitly harness this lexical similarity, as they only implicitly align HRL and ELRL latent embedding space. To overcome this limitation, we propose a novel, CHARSPAN, approach based on character-span noise augmentation into the training data of HRL. This serves as a regularization technique, making the model more robust to lexical divergences between the HRL and ELRL, thus facilitating effective cross-lingual transfer. Our method significantly outperformed strong baselines in zero-shot settings on closely related HRL and ELRL pairs from three diverse language families, emerging as the state-of-the-art model for ELRLs.

Cross-lingual Word Embeddings beyond Zero-shot Machine Translation

2020

We explore the transferability of a multilingual neural machine translation model to unseen languages when the transfer is grounded solely on the cross-lingual word embeddings. Our experimental results show that the translation knowledge can transfer weakly to other languages and that the degree of transferability depends on the languages' relatedness. We also discuss the limiting aspects of the multilingual architectures that cause weak translation transfer and suggest how to mitigate the limitations.

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

ArXiv, 2021

In recent years, pre-trained multilingual language models, such as multilingual BERT and XLM-R, exhibit good performance on zero-shot cross-lingual transfer learning. However, since their multilingual contextual embedding spaces for different languages are not perfectly aligned, the difference between representations of different languages might cause zero-shot cross-lingual transfer failed in some cases. In this work, we draw connections between those failed cases and adversarial examples. We then propose to use robust training methods to train a robust model that can tolerate some noise in input embeddings. We study two widely used robust training methods: adversarial training and randomized smoothing. The experimental results demonstrate that robust training can improve zero-shot cross-lingual transfer for text classification. The performance improvements become significant when the distance between the source language and the target language increases.

On the ability of monolingual models to learn language-agnostic representations

ArXiv, 2021

Pretrained multilingual models have become a de facto default approach for zero-shot crosslingual transfer. Previous work has shown that these models are able to achieve crosslingual representations when pretrained on two or more languages with shared parameters. In this work, we provide evidence that a model can achieve language-agnostic representations even when pretrained on a single language. That is, we find that monolingual models pretrained and finetuned on different languages achieve competitive performance compared to the ones that use the same target language. Surprisingly, the models show a similar performance on a same task regardless of the pretraining language. For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.

Low-Resource Machine Translation Using Cross-Lingual Language Model Pretraining

Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, 2021

This paper describes UTokyo's submission to the AmericasNLP 2021 Shared Task on machine translation systems for indigenous languages of the Americas. We present a lowresource machine translation system that improves translation accuracy using cross-lingual language model pretraining. Our system uses an mBART implementation of FAIRSEQ to pretrain on a large set of monolingual data from a diverse set of high-resource languages before finetuning on 10 low-resource indigenous American languages: Aymara, Bribri, Asháninka, Guaraní, Wixarika, Náhuatl, Hñähñu, Quechua, Shipibo-Konibo, and Rarámuri. On average, our system achieved BLEU scores that were 1.64 higher and CHRF scores that were 0.0749 higher than the baseline.

AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages (original) (raw)

Related papers