A Very Big Video Reasoning Suite (original) (raw)
We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.
Knowledge training set
Prompt
The image contains 2 clocks, each with only an hour hand. Exactly one clock has its hour hand pointing to 12 o'clock. First find the single clock pointing to 12 o'clock, then draw a red circle around it. Do not change anything else. Show the complete solution step by step.
First Frame
Last Frame
Abstraction out-of-domain testset
Prompt
Complete this checkerboard pattern by filling in the missing cells on the right side. The left half shows a checkerboard pattern where adjacent cells alternate between filled and empty. Mirror the left half across the vertical center line to complete the symmetric checkerboard. Keep the camera view fixed in the top-down perspective and maintain all existing cells unchanged. Stop the video when the pattern is complete.
First Frame
Last Frame
Spatiality training set
Prompt
The scene shows a 10x10 grid with a green start point, a red end point, and colored cells (orange, yellow, and blue). A purple circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the colored cells in order (orange, then yellow, then blue), taking the shortest path between each consecutive pair of colored cells. The agent is allowed to pass through the red end point when visiting the colored cells if needed. After visiting all colored cells in sequence, the agent must reach the red end point, also following the shortest path.
First Frame
Last Frame
Transformation out-of-domain testset
Prompt
Colored animal faces are on the left side of the canvas, and dark outlines of animals are on the right side. Move each colored animal face to its matching outline via the shortest path.
First Frame
Last Frame
Perception in-domain testset
Prompt
The scene shows two objects, one on the left and one on the right, with a green attention box around the right object. The objects remain stationary and unchanged throughout. Move the green attention box from the right object to the left object.
First Frame
Last Frame
Circle Largest Value - Samples
Task Domains 1/5
Circle Largest Value
Knowledge out-of-domain testset
Grid Highest Cost
Abstraction in-domain testset
Directed Graph Navigation
Spatiality in-domain testset
2D Object Rotation
Transformation out-of-domain testset
Spot Unique Color
Perception in-domain testset
Ground Truth
First
Final
VBVR-Wan2.2
CogVideoX 1.5
Kling 2.6
LTX-2
Runway Gen-4
Sora 2
Veo 3
Wan 2.2 I2V
Hunyuan I2V
Human
#1
VBVR-Wan2.2
#2
Sora 2
#3
Veo 3.1
#4
Runway Gen-4 Turbo
#5
Wan2.2-I2V-A14B
#6
Kling 2.6
#7
LTX-2
#8
CogVideoX1.5-5B-I2V
#9
HunyuanVideo-I2V
#9