GitHub - nasosger/MuToR: [NeurIPS '25] Multi-Token Prediction Needs Registers (original) (raw)

⭐ NeurIPS 2025 POSTER ⭐

Anastasios Gerontopoulos1,3, Spyros Gidaris2, Nikos Komodakis1,3,4

1Archimedes/Athena RC 2valeo.ai 3University of Crete 4IACM-Forth

This repository contains the official implementation of the paper: Multi-Token Prediction Needs Registers.

TL;DR: We propose MuToR, a simple and effective approach that leverages register tokens to predict future targets and enrich supervision for autoregressive transformers. Our method introduces only a negligible number of additional parameters and requires no architectural changes—ensuring compatibility with off-the-shelf pretrained language models.

News

[2025/9/18] MuToR was accepted in NeurIPS 2025 !
[2025/9/1] We release the code for finetuning pretrained LMs on downstream tasks
[2025/5/15] Paper appears on arXiv

Getting started

We provide the checkpoints from the paper via HuggingFace. They can be loaded in our evaluation script to reproduce our results.

Finetune Pretrained LLMs with MuToR

We provide our code to finetune pretrained LLMs on downstream tasks. Checkout our relevant guidelines for more details on setting up the environment and running experiments.

MuToR for autoregressive image generation.

We also plan to open-source our code for MuToR-2D, stay tuned!

Citation

If you use our code, or if find this work useful in your research, please consider citing:

@article{gerontopoulos2025multi, title={Multi-Token Prediction Needs Registers}, author={Gerontopoulos, Anastasios and Gidaris, Spyros and Komodakis, Nikos}, journal={arXiv preprint arXiv:2505.10518}, year={2025} }