Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking (original) (raw)
Abstract:Vision-language models (VLMs) are powerful but remain opaque black boxes. We introduce the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning. By utilizing transcoders, attribution graphs, and attention-based methods, we uncover how VLMs hierarchically integrate visual and semantic concepts. We reveal that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations. Validated through feature steering and circuit patching, our framework proves these circuits are causal and controllable, laying the groundwork for more explainable and reliable VLMs.
Submission history
From: Jingcheng Yang [view email]
[v1] Mon, 23 Feb 2026 20:26:45 UTC (4,805 KB)