Gemini 3.1 Pro (original) (raw)

3.5 Pro coming soon

Gemini 3.1 Pro

Best for complex tasks and bringing creative concepts to life

A smarter model to help you learn, plan, and build like never before.

Slide 1 of 4

Reasoning with unprecedented depth and nuance

Smart, concise, direct responses – with genuine insight over cliche and flattery.

Advanced multimodal understanding

Text, images, video, audio – even code. Gemini 3.1 Pro is advanced on reasoning with unprecedented depth and nuance.

Highly capable in vibe coding and agentic coding

Gemini 3.1 brings exceptional instruction following – with meaningful improved tool use and agentic coding.

Improved agentic capabilities

Better tool use. Simultaneous, multi-step tasks. Gemini 3.1’s agentic capabilities can build more helpful and intelligent personal AI assistants.


Slide 1 of 5

Transform real-time data into a stunning, interactive visualization

Gemini 3.1 Pro uses advanced reasoning to configure live telemetry streams to build dynamic applications like this aerospace dashboard.

Prototype sensory-rich interfaces with complex, interactive 3D simulations

Gemini 3.1 Pro codes an immersive starling murmuration, complete with hand-tracking manipulation and dynamic generative audio.

Generate complex, believable environments

From terrain generation to traffic flow, Gemini 3.1 Pro uses advanced reasoning to code and assemble the many layers of a simulated city.

Build lightweight, scalable animations directly into your codebase

Gemini 3.1 Pro understands design intent, converting static SVGs into animated, code-based graphics for faster, cleaner web development.

Translate complex literary themes into functional code and sleek, contemporary interfaces

Gemini 3.1 Pro reasons through the atmospheric tone of a novel to build a modern, personalized portfolio.

Benchmark Notes Gemini 3.1 Pro Thinking (High) Gemini 3 Pro Thinking (High) Sonnet 4.6 Thinking (Max) Opus 4.6 Thinking (Max) GPT-5.2 Thinking (xhigh) GPT-5.3-Codex Thinking (xhigh)
Humanity's Last Exam Academic reasoning (full set, text + MM) No tools 44.4% 37.5% 33.2% 40.0% 34.5%
Search (blocklist) + Code 51.4% 45.8% 49.0% 53.1% 45.5%
ARC-AGI-2 Abstract reasoning puzzles ARC Prize Verified 77.1% 31.1% 58.3% 68.8% 52.9%
GPQA Diamond Scientific knowledge No tools 94.3% 91.9% 89.9% 91.3% 92.4%
Terminal-Bench 2.0 Agentic terminal coding Terminus-2 harness 68.5% 56.9% 59.1% 65.4% 54.0% 64.7%
Other best self-reported harness 62.2% (Codex) 77.3% (Codex)
SWE-Bench Verified Agentic coding Single attempt 80.6% 76.2% 79.6% 80.8% 80.0%
SWE-Bench Pro (Public) Diverse agentic coding tasks Single attempt 54.2% 43.3% 55.6% 56.8%
LiveCodeBench Pro Competitive coding problems from Codeforces, ICPC, and IOI Elo 2887 2439 2393
SciCode Scientific research coding 59% 56% 47% 52% 52%
APEX-Agents Long horizon professional tasks 33.5% 18.4% 29.8% 23.0%
GDPval-AA Elo Expert tasks 1317 1195 1633 1606 1462
τ2-bench Agentic and tool use Retail 90.8% 85.3% 91.7% 91.9% 82.0%
Telecom 99.3% 98.0% 97.9% 99.3% 98.7%
MCP Atlas Multi-step workflows using MCP 69.2% 54.1% 61.3% 59.5% 60.6%
BrowseComp Agentic search Search + Python + Browse 85.9% 59.2% 74.7% 84.0% 65.8%
MMMU-Pro Multimodal understanding and reasoning No tools 80.5% 81.0% 74.5% 73.9% 79.5%
MMMLU Multilingual Q&A 92.6% 91.8% 89.3% 91.1% 89.6%
MRCR v2 (8-needle) Long context performance 128k (average) 84.9% 77.0% 84.9% 84.0% 83.8%
1M (pointwise) 26.3% 26.3% Not supported Not supported Not supported

Name

3.1 Pro

Status

Preview

Input

Output

Input tokens

1M

Output tokens

64k

Knowledge cutoff

January 2025

Tool use

Best for

Availability

Documentation

View developer docs

Model card

View model card