Veo (original) (raw)
Introducing Veo 3, our video generation model with expanded creative controls – including native audio and extended videos.
Re-designed for greater realism
Greater realism and fidelity, made possible by Veo 3’s real world physics and audio.
Follows prompts like never before
Improved prompt adherence, meaning more accurate responses to your instructions.
Improved creative control
Offers new levels of control, consistency, and creativity – now across audio.
Prompt: A medium shot opens on a seasoned, grey-bearded man in sunglasses and a paisley shirt, his gaze fixed off-camera with a contemplative expression. His gold chain glints subtly. Beside him, a younger man in a tank top, also looking forward, suggests a shared moment of observation or reflection. The camera slowly pushes in, subtly emphasizing their quiet focus. In the background, a vibrant mural splashes across a wall, hinting at an urban setting. Faint city murmurs and distant chatter drift in, accompanied by a mellow, soulful hip-hop beat that adds a contemplative yet grounded atmosphere. "The city always got a story," the older man murmurs, a slight nod of his head. "Just gotta listen."
Veo 3 lets you add sound effects, ambient noise, and even dialogue to your creations – generating all audio natively. It also delivers best in class quality, excelling in physics, realism and prompt adherence.
Greater control, consistency, and creativity than ever before.
Prompt: Close up shot of woman with sunglasses on top of her head, gold hood earrings, is walking in the interior, she is lost and asks where everyone is and what's going on.
Flow
Built with creatives, for creatives. Flow enables you to create seamless cinematic clips, scenes, and stories using our most capable generative AI models.
Slide 1 of 9
Text-to-video
T2V Overall preference
Participants viewed 1,003 prompts and respective videos on MovieGenBench, a benchmark dataset released by Meta. Veo 3.1 performs best on overall preference.
Text-to-video
T2V Text alignment
Participants viewed 1,003 prompts and respective videos on MovieGenBench, a benchmark dataset released by Meta. Veo 3.1 performs best on its capability to follow prompts accurately.
Text-to-video
T2V Visual quality
Participants viewed 1,003 prompts and respective videos on MovieGenBench, a benchmark dataset released by Meta. Participants rate the visual quality of Veo’s outputs more highly than other models.
Note: We were unable to compare image to video with Sora 2 Pro because it currently does not support realistic human images.
Image-to-video
I2V Overall preference
When participants viewed 355 image and text pairs from the VBench I2V benchmark, Veo 3’s outputs were preferred overall compared to other models.
Note: We were unable to compare image to video with Sora 2 Pro because it currently does not support realistic human images.
Image-to-video
I2V Text alignment
When participants viewed 355 image and text pairs from the VBench I2V benchmark, Veo 3.1’s outputs were preferred to other models for capturing the intent of the prompt.
Note: We were unable to compare image to video with Sora 2 Pro because it currently does not support realistic human images.
Image-to-video
I2V Visual quality
When participants viewed 355 image and text pairs from the VBench I2V benchmark, Veo 3.1’s outputs were preferred overall to other models for the visual quality.
Text-to-video and audio
T2VA Audio visual overall preference
Participants viewed 527 prompts from MovieGenBench, and had an overall preference for Veo’s outputs with audio over other models.
Text-to-video and audio
T2VA Audio-video alignment
Participants viewed 527 prompts from MovieGenBench, and chose Veo 3.1’s outputs over other models for having audio that is better synchronized with the video content.
Text-to-video
T2V Visually realistic physics
Participants choose Veo 3.1’s outputs over other models for having visually realistic physics on the physics subset of MovieGenBench prompts.
Slide 1 of 4
[1] Human raters conducted direct side-by-side comparisons across 364 diverse examples (each including a prompt and 1-3 reference images and evaluating a single generated video per prompt + reference images). All comparisons were done at 1280x720 resolution. Veo videos are 8 seconds long. All other videos are 10 seconds long and shown at full length to raters.
To ensure a fair visual comparison, all tests were conducted without sound. Audio was only enabled for the Overall Preference metric, and only when competing models had native sound support for the capability. We have indicated when audio was an active part of the comparison on the labels in the chart.
Ingredients to video
Overall preference and visual quality
Veo’s “Ingredients to Video” capability has achieved state-of-the-art results for: Overall Preference and Visual Quality in head-to-head comparisons by human raters against other leading video generation models on internal benchmarks. [1]
[1] Human raters conducted direct side-by-side comparisons across 80 diverse examples (each including initial text prompt and extension prompt evaluating one generated video per example. All comparisons were done at 720x1280 resolution. Veo videos are 8 seconds long. All other videos are 6 seconds long and shown at full length to raters.
To ensure a fair visual comparison, all tests were conducted without sound. Audio was only enabled for the Overall Preference metric, and only when competing models had native sound support for the capability. We have indicated when audio was an active part of the comparison on the labels in the chart.
Ingredients to video
Scene extension
Veo’s “Scene Extension” capability has achieved state-of-the-art results for: Overall Preference, Prompt Alignment and Visual Quality in head-to-head comparisons by human raters against other leading video generation models on internal benchmarks. [1]
[1] Human raters conducted direct side-by-side comparisons across 106 diverse examples (each including a prompt and a start and end images, evaluating one generated video per example. All comparisons were done at 720x1280 resolution. Veo videos are 8 seconds long. All other videos are 10 seconds long and shown at full length to raters.
To ensure a fair visual comparison, all tests were conducted without sound. Audio was only enabled for the Overall Preference metric, and only when competing models had native sound support for the capability. We have indicated when audio was an active part of the comparison on the labels in the chart.
Ingredients to video
First and last frame
Veo’s “First and Last Frame” capability has achieved state-of-the-art results for: Overall Preference, Prompt Alignment and Visual Quality, in head-to-head comparisons by human raters against other leading video generation models on internal benchmarks. [1].
[1] Human raters conducted direct side-by-side comparisons across 124 diverse examples (each including a video and a prompt, specifying which object to insert, evaluating one generated video per example.
All comparisons were done at 1280x720 (or 720x1280) resolution. Veo videos are 6 seconds long. All competing model videos are 5 seconds long and shown at full length to raters. All videos had no sound.
Ingredients to video
Object insertion
Veo’s “Object Insertion” capability has achieved state-of-the-art results for Overall Preference and Visual Quality, in head-to-head comparisons by human raters against other leading video generation models on internal benchmarks [1].
Promise
Promise Studios uses Veo 3.1 within its MUSE Platform to enhance generative storyboarding and previsualization for director-driven storytelling at production quality.
Volley
Volley powers its new AI-powered RPG, Wit's End, with Veo 3.1 to deliver static cinematics and dynamically generated assets narrating player progress.
OpusClip
OpusClip leverages Veo 3.1 within its Agent Opus to boost motion graphics and create realistic promotional videos for SMBs.
Gemini
Supercharge your creativity and productivity
Flow
An AI filmmaking tool built with and for creatives
Google AI Studio
The fastest path from prompt to production
Gemini API
Get started building with cutting-edge AI models
Vertex AI Studio
Test, tune, and deploy enterprise-ready generative AI