HuggingFaceTB/smollm3-configs · Datasets at Hugging Face (original) (raw)

SmolLM3 Training Configs

[IMPORTANT NOTE]: for the latest configs go to this repo: https://github.com/huggingface/smollm/tree/main/text/pretraining/smollm3

Here you can find the training configs for SmoLLM3-3B-Base using nanotron with exact training details and data mixtures.

The model was trained on 11.2T tokens in 3 stages on 4k context:

And then we trained on an additional 2 stages to extend the contetx length to 64k: