[Devstral 24B] FP8 is currently not working correctly (original) (raw)

System Info

Latest transformers "main".

Who can help?

@SunMarc @MekkCyber

Information

Tasks

Reproduction

If you run this code snippet: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512#transformers with dequantize=False, you will notice that it has infinite repetition issues.

If however you run the model from this PR: #42744, you can see that everything works fine which leads to the conclusion that something funky is going on with the activation scales (maybe they give inf values somewhere?).

The same activation scales work for vLLM: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512#vllm-recommended so there is probably something we can do inside transformers to fix it?

Expected behavior

That FP8 works correctly