Chunked cross-entropy loss for SFT (up to –50% VRAM) by qgallouedec · Pull Request #5575 · huggingface/trl (original) (raw)
and others added 4 commits
qgallouedec changed the title
Chunked Cross-Entropy Chunked Cross-Entropy: Up to 50% reduced VRAM
qgallouedec changed the title
Chunked Cross-Entropy: Up to 50% reduced VRAM Chunked cross-entropy loss for SFT (up to –50% VRAM)
[](/apps/chatgpt-codex-connector)
[](/apps/cursor)
[](/apps/cursor)
[](/apps/cursor)
[](/apps/cursor)
[](/apps/cursor)
This was referenced
Apr 22, 2026
[](/apps/cursor)
[](/apps/cursor)
[](/apps/cursor)
AmineDiro added a commit that referenced this pull request
- Note transformers #45433 (sonic-moe CuteDSL kernel integration)
- Highlight TRL-side contribution to #45621 (wrapper-side masked_fill pair in grouped_mm_experts_forward)
- Credit @qgallouedec on TRL #5575 (chunked CE) and explain why it is load-bearing for the long-context recipe
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
This was referenced
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})