A Very Big Video Reasoning Suite (original) (raw)

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Knowledge training set

Prompt

The image contains 2 clocks, each with only an hour hand. Exactly one clock has its hour hand pointing to 12 o'clock. First find the single clock pointing to 12 o'clock, then draw a red circle around it. Do not change anything else. Show the complete solution step by step.

First Frame

First Frame

Last Frame

Last Frame

Abstraction out-of-domain testset

Prompt

Complete this checkerboard pattern by filling in the missing cells on the right side. The left half shows a checkerboard pattern where adjacent cells alternate between filled and empty. Mirror the left half across the vertical center line to complete the symmetric checkerboard. Keep the camera view fixed in the top-down perspective and maintain all existing cells unchanged. Stop the video when the pattern is complete.

First Frame

First Frame

Last Frame

Last Frame

Spatiality training set

Prompt

The scene shows a 10x10 grid with a green start point, a red end point, and colored cells (orange, yellow, and blue). A purple circular agent is positioned at the green start point. The agent can move to adjacent cells (up, down, left, right). Starting from the green start point, the agent must visit the colored cells in order (orange, then yellow, then blue), taking the shortest path between each consecutive pair of colored cells. The agent is allowed to pass through the red end point when visiting the colored cells if needed. After visiting all colored cells in sequence, the agent must reach the red end point, also following the shortest path.

First Frame

First Frame

Last Frame

Last Frame

Transformation out-of-domain testset

Prompt

Colored animal faces are on the left side of the canvas, and dark outlines of animals are on the right side. Move each colored animal face to its matching outline via the shortest path.

First Frame

First Frame

Last Frame

Last Frame

Perception in-domain testset

Prompt

The scene shows two objects, one on the left and one on the right, with a green attention box around the right object. The objects remain stationary and unchanged throughout. Move the green attention box from the right object to the left object.

First Frame

First Frame

Last Frame

Last Frame

Circle Largest Value - Samples

Task Domains 1/5

Circle Largest Value

Knowledge out-of-domain testset

Grid Highest Cost

Abstraction in-domain testset

Directed Graph Navigation

Spatiality in-domain testset

2D Object Rotation

Transformation out-of-domain testset

Spot Unique Color

Perception in-domain testset

Ground Truth

First

Final

VBVR-Wan2.2

CogVideoX 1.5

Kling 2.6

LTX-2

Runway Gen-4

Sora 2

Veo 3

Wan 2.2 I2V

Hunyuan I2V

Human

Human

#1

VBVR

VBVR-Wan2.2

#2

Sora 2

Sora 2

#3

Veo 3.1

Veo 3.1

#4

Runway

Runway Gen-4 Turbo

#5

Wan2.2

Wan2.2-I2V-A14B

#6

Kling

Kling 2.6

#7

LTX-2

LTX-2

#8

CogVideoX

CogVideoX1.5-5B-I2V

#9

HunyuanVideo

HunyuanVideo-I2V

#9