Nemotron 3 Super & Ultra Models leaking metadata and chatting in longform content (original) (raw)

I’ve been running both Nemotron 3 Super and Ultra in a production longform-fiction pipeline (multi-book series, ~50K+ words per generation cycle), and I’m hitting two recurring failure modes that I wanted to document and see if others are seeing the same thing.

1. Metadata / scaffolding leakage into prose

The models intermittently emit internal scaffolding directly into the generated narrative. Examples of what’s bleeding through:

Section or structural markers (act/chapter/beat labels) appearing inline in the prose rather than being consumed as instructions
Planning artifacts — things like restated outline points, “in this section we will…” framing, or echoed system/instruction fragments
Field-like tokens or key: value remnants showing up mid-paragraph

This isn’t a prompt-clarity issue on my end — the same prompt structure runs clean on other model families. It reads like the boundary between the control layer and the output layer isn’t being respected consistently.

2. Chat / assistant register intruding into narrative prose

The bigger problem for longform work: both models drop out of narrative voice and into conversational assistant mode. Symptoms:

Direct address to the reader/user (“Here’s the next part of the story…”, “Let me continue…”)
Meta-commentary about the writing itself (“This scene establishes…”, “To build tension here…”)
Wrap-up / summary turns at the end of a section, as if closing a chat reply rather than ending a chapter
Hedging and helper phrasing that has no place in third-person fiction

The net effect is that long generations need heavy post-processing to strip out conversational connective tissue that a fiction model shouldn’t be producing in the first place.

Questions for the community / NVIDIA team:

Is this a known characteristic of the instruction-tuning on the 3 Super/Ultra line? It feels like assistant-style RLHF bleeding into completion-style tasks.
Are there recommended generation params (sampling, system prompt framing, stop sequences) that suppress the chat register without flattening prose quality?
Is there a base/less-aligned variant better suited to creative completion work?

Happy to share reproducible examples and the prompt structure if useful.