Nemotron 3 Super & Ultra Models leaking metadata and chatting in longform content (original) (raw)
I’ve been running both Nemotron 3 Super and Ultra in a production longform-fiction pipeline (multi-book series, ~50K+ words per generation cycle), and I’m hitting two recurring failure modes that I wanted to document and see if others are seeing the same thing.
1. Metadata / scaffolding leakage into prose
The models intermittently emit internal scaffolding directly into the generated narrative. Examples of what’s bleeding through:
- Section or structural markers (act/chapter/beat labels) appearing inline in the prose rather than being consumed as instructions
- Planning artifacts — things like restated outline points, “in this section we will…” framing, or echoed system/instruction fragments
- Field-like tokens or key: value remnants showing up mid-paragraph
This isn’t a prompt-clarity issue on my end — the same prompt structure runs clean on other model families. It reads like the boundary between the control layer and the output layer isn’t being respected consistently.
2. Chat / assistant register intruding into narrative prose
The bigger problem for longform work: both models drop out of narrative voice and into conversational assistant mode. Symptoms:
- Direct address to the reader/user (“Here’s the next part of the story…”, “Let me continue…”)
- Meta-commentary about the writing itself (“This scene establishes…”, “To build tension here…”)
- Wrap-up / summary turns at the end of a section, as if closing a chat reply rather than ending a chapter
- Hedging and helper phrasing that has no place in third-person fiction
The net effect is that long generations need heavy post-processing to strip out conversational connective tissue that a fiction model shouldn’t be producing in the first place.
Questions for the community / NVIDIA team:
- Is this a known characteristic of the instruction-tuning on the 3 Super/Ultra line? It feels like assistant-style RLHF bleeding into completion-style tasks.
- Are there recommended generation params (sampling, system prompt framing, stop sequences) that suppress the chat register without flattening prose quality?
- Is there a base/less-aligned variant better suited to creative completion work?
Happy to share reproducible examples and the prompt structure if useful.