Blog (original) (raw)

Neurons in LLMs: Dead, N-gram, Positional

This is a post for the paper Neurons in Large Language Models: Dead, N-gram, Positional.

With scale, LMs become more exciting but, at the same time, harder to analyze. We show that even with simple methods and a single GPU, you can do a lot! We analyze OPT models up to 66b and find that

NMT Training Process though the Lens of SMT

This is a post for the EMNLP 2021 paper Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT.

In SMT, model competences are modelled with distinct models. In NMT, the whole translation task is modelled with a single neural network. How and when does NMT get to learn all the competences? We show that

Neural Machine Translation Inside Out

This is a blog version of my talk at the ACL 2021 workshopRepresentation Learning for NLP (and an updated version of that at NAACL 2021 workshop Deep Learning Inside Out (DeeLIO)).

In the last decade, machine translation shifted from the traditional statistical approaches with distinct components and hand-crafted features to the end-to-end neural ones. We try to understand how NMT works and show that:

Source and Target Contributions to NMT Predictions

This is a post for the ACL 2021 paper Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation.

In NMT, the generation of a target token is based on two types of context: the source and the prefix of the target sentence. We show how to evaluate the relative contributions of source and target to NMT predictions and find that:

Information-Theoretic Probing with MDL

This is a post for the EMNLP 2020 paper Information-Theoretic Probing with Minimum Description Length.

Probing classifiers often fail to adequately reflect differences in representations and can show different results depending on hyperparameters.
As an alternative to the standard probes,

When a Good Translation is Wrong in Context

This is a post for the ACL 2019 paper When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion.

From this post, you will learn:

The Story of Heads

This is a post for the ACL 2019 paper Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.

From this post, you will learn: