TAPVid-3D: A Benchmark for Tracking Any Point in 3D (original) (raw)
What is the dataset?
TAPVid-3D is a dataset and benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). The dataset consists of 4,000+ real-world videos and 2.1 million metric 3D point trajectories, spanning a variety of object types, motion patterns, and indoor and outdoor environments.
While point tracking in two dimensions (TAP-2D) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS [2], benchmarks for three-dimensional point tracking on real-world videos were lacking. To fill this gap, we built a new benchmark for 3D point tracking leveraging existing footage.
To measure performance on the TAP-3D task, we formulated a Jaccard-based metric to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal smoothness.
In the paper, we assess the current state of the TAP-3D task by constructing competitive baselines using existing tracking models, such as SpatialTracker. You can read more and find out how to download and generate the data using the GitHub link above. We hope you'll find the benchmark useful!
Video Summary
Dataset Samples
Statistics Overview
#clips | #trajs per clip | #frames per clip | #videos | #scenes | resolution | fps |
---|---|---|---|---|---|---|
4569 | 50 - 1024 | 25 − 300 | 2828 | 255 | Multiple | 10 / 30 |
Licensing
The annotations and code to generate TAPVid-3D are released under a slightly modified Apache 2.0 license, as described in the LICENSE file in GitHub. In particular, to use the code and annotations for a particular data subset (Waymo Open, Aria Digital Twin, and Panoptic Studio), you must agree and adhere to license and terms of use for usage of the corresponding data and annotations.