Nemotron 3 Super & Ultra Models leaking metadata and chatting in longform content (original) (raw)

I’ve been running both Nemotron 3 Super and Ultra in a production longform-fiction pipeline (multi-book series, ~50K+ words per generation cycle), and I’m hitting two recurring failure modes that I wanted to document and see if others are seeing the same thing.

1. Metadata / scaffolding leakage into prose

The models intermittently emit internal scaffolding directly into the generated narrative. Examples of what’s bleeding through:

This isn’t a prompt-clarity issue on my end — the same prompt structure runs clean on other model families. It reads like the boundary between the control layer and the output layer isn’t being respected consistently.

2. Chat / assistant register intruding into narrative prose

The bigger problem for longform work: both models drop out of narrative voice and into conversational assistant mode. Symptoms:

The net effect is that long generations need heavy post-processing to strip out conversational connective tissue that a fiction model shouldn’t be producing in the first place.

Questions for the community / NVIDIA team:

Happy to share reproducible examples and the prompt structure if useful.