Computer Vision Group, Freiburg (original) (raw)

Home

Datasets

Scene Flow Datasets: FlyingThings3D, Driving, Monkaa

This dataset collection has been used to train convolutional networks in our CVPR 2016 paper A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Here, we make all generated data freely available.

Terms of use
Overview
Downloads (sample pack and full datasets)
Data formats and organization
Frequently asked questions (FAQ)
Version history and changelog

Terms of use

This dataset is provided for research purposes only and without any warranty. Any commercial use is prohibited. If you use the dataset or parts of it in your research, you should cite the aforementioned paper:

@InProceedings{MIFDB16, author = "N. Mayer and E. Ilg and P. H{"a}usser and P. Fischer and D. Cremers and A. Dosovitskiy and T. Brox", title = "A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation", booktitle = "IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)", year = "2016", note = "arXiv:1512.02134", url = "http://lmb.informatik.uni-freiburg.de/Publications/2016/MIFDB16" }

Overview

The collection contains more than 39000 stereo frames in 960x540 pixel resolution, rendered from various synthetic sequences. For details on the characteristics and differences of the three subsets, we refer the reader to our paper. The following kinds of data are currently available:

Left view	Right view	Explanation
		RGB stereo renderings: Rendered images are available in cleanpass and finalpass versions (the latter with more realistic—but also more difficult—effects such as motion blur and depth of field). Both versions can be downloaded as lossless PNG or high-quality lossy WebP images.
		Segmentations: Object-level and material-level segmentation images.
		Optical flow maps: The optical flow describes how pixels move between images (here, between time steps in a sequence). It is the projected screenspace component of full scene flow, and used in many computer vision applications.
		Disparity maps: Disparity here describes how pixels move between the two views of a stereo frame. It is a formulation of depth which is independent of camera intrinsics (although it depends on the configuration of the stereo rig), and can be seen as a special case of optical flow.
		Disparity change maps: Disparity alone is only valid for a single stereo frame. In image sequences, pixel disparities change with time. This disparity change data fills the gaps in scene flow that occur when one uses only optical flow and static disparity.
		Motion boundaries: Motion boundaries divide an image into regions with significantly different motion. They can be used to better judge the performance of an algorithm at discontinuities.
Camera data: Full intrinsic and extrinsic camera data is available for each view of every stereo frame in our dataset collection.

Downloads

Example pack

Want to get your feet wet? Wish to check out our data without downloading gigabytes of archives first? Then get our sample pack!

Contains three consecutive frames from each dataset, in full resolution!
Includes samples from all available data: RGB, disparity, flow, segmentation...
Stereo pairs for RGB
Forward and backward optical flow
Less than 100 megabytes!
Handcrafted with love and care, just for you <3
(Note: "Driving" samples with 15mm camera, others with 35mm camera)

Full datasets

Downloading via torrent

Where applicable, please download our datasets via BitTorrent.

It's faster (and the more people use it, the faster it gets!)
Torrents automatically resume the download if your connection drops
It lessens the burden on our poor webserver

FlyingThings3D	Driving	Monkaa
Raw data
RGB images (cleanpass)	PNG: .torrent (37GB) WebP: .torrent (7.4GB)	PNG: .torrent (6.3GB) WebP: .torrent (1.5GB)	PNG: .tar (9.1GB) WebP: .tar (1.8GB)
RGB images (finalpass)	PNG: .torrent (43GB) WebP: .torrent (5.7GB)	PNG: .torrent (6.1GB) WebP: .torrent (926MB)	PNG: .tar (17GB) WebP: [disabled]
Camera data	.tar (15MB)	.tar (1.8MB)	.tar (3.7MB)
Object segmentation	.tar.bz2 (409MB, unzipped 104GB)	.tar.bz2 (78MB, unzipped 18GB)	.tar.bz2 (83MB, unzipped 34GB)
Material segmentation	.tar.bz2 (510MB, unzipped 104GB)	.tar.bz2 (170MB, unzipped 18GB)	.tar.bz2 (115MB, unzipped 34GB)
Derived data
Disparity	.torrent (87GB, unzipped 104GB)	.torrent (9GB, unzipped 18GB)	.tar.bz2 (28GB, unzipped 34GB)
Disparity change	.torrent (116GB, unzipped 208GB)	.torrent (22GB, unzipped 35GB)	.tar.bz2 (35GB, unzipped 68GB)
Optical flow	.torrent (311GB, unzipped 621GB)	.torrent (50GB, unzipped 102GB)	.tar.bz2 (89GB, unzipped 201GB)
Motion boundaries	.tar.bz2 (615MB, unzipped 52GB)	.tar.bz2 (206MB, unzipped 8.6GB)	.tar.bz2 (106MB, unzipped 17GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).

DispNet/FlowNet2.0 dataset subsets

For our network training and testing in the DispNet, FlowNet2.0 etc. papers, we omitted some extremely hard samples from the FlyingThings3D dataset. Here you can download these subsets for the modalities which we used:

FlyingThings3D subset	Sequence lengths
Raw data	Unlike the original datasets (above), these subset downloads are not split into individual scenes. To get the lengths of individual sequences, we provide annotations files for the train and val splits. These files contain the length (=number of samples) of one scene per line.
RGB images (cleanpass)	.torrent (35GB, unzipped 35GB)
Object segmentation	.tar.bz2 (570MB, unzipped 674MB)
Derived data
Disparity	.torrent (5GB, unzipped 102GB)
Disparity change	.torrent (2.4GB, unzipped 182GB)
Optical flow	.torrent (75GB, unzipped 364GB)
Motion boundaries	.tar.bz2 (979MB, unzipped 1167MB)
Motion Boundary Weights	.torrent (12G, unzipped 137GB)
Disparity Occlusions	.tar.bz2 (420MB, unzipped 525MB)
Disparity Occlusion Weights	.torrent (9GB, unzipped 102GB)
Flow Occlusions	.tar.bz2 (691MB, unzipped 889MB)
Flow Occlusion Weights	.torrent (15GB, unzipped 182GB)
Depth Boundaries	.tar.bz2 (654MB, unzipped 755MB)
Depth Boundary Weights	.torrent (11GB, unzipped 102GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).

Data formats and organization

Download handy Python IO routines. (Read/write .float3/.flo/.ppm/.pgm/.png/.jpg/.pfm)
Use bunzip2 to decompress .tar.bz2 files, and use "tar xf <file.tar>" to unpack .tar archives. Caution, some archives expand to massively larger sizes.
The RGB image packs are available in both cleanpass and finalpass settings. The cleanpass setting includes lighting and shading effects, but no additional effects. In contrast, finalpass images also contain motion blur and defocus blur.
All RGB images are provided as both lossless PNG and lossy WebP (used in our experiments). WebP images are compressed using a quality setting of 95%, using the publicly available source code (version 0.5.0). WebP offers 80-90% smaller files than PNG, with virtually indistinguishable results.

The virtual imaging sensor has a size of 32.0mmx18.0mm.
Most scenes use a virtual focal length of 35.0mm. For those scenes, the virtual camera intrinsics matrix is given by

fx=1050.0	0.0	cx=479.5
0.0	fy=1050.0	cy=269.5
0.0	0.0	1.0
where (fx,fy) are focal lengths and (cx,cy) denotes the principal point.
Some scenes in the Driving subset use a virtual focal length of 15.0mm (the directory structure describes this clearly). For those scenes, the intrinsics matrix is given by
fx=450.0	0.0	cx=479.5
--------	--------	--------
0.0	fy=450.0	cy=269.5
0.0	0.0	1.0
Please note that due to Blender's coordinate system convention (see below), the focal length values (fx,fy) really should be negative numbers. Here we list the positive numbers because in practise this catch is only important when working on the raw 3D data.

All data comes in a stereo setting, i.e. there are "left" and "right" subfolders for everything. The obligatory exception to this rule is the camera data where everything is stored in a single (small) text file per scene.

Camera extrinsics data is stored as follows: Each camera_data.txt file contains the following entry for each frame of its scene:

...
Frame _<frame_id>_\n	frame_id is the frame index. All images and data files for this frame carry this name, as a four-digit number with leading zeroes for padding.
L _T00 T01 T02 T03 T10 ... T33_\n	Camera-to-world 4x4 matrix for the left view of the stereo pair in row-major order, i.e. (T00 T01 T02 T03) encodes the uppermost row from left to right.
R _T00 T01 T02 T03 T10 ... T33_\n	Ditto for the right view of the stereo pair.
\n	(an empty line)
Frame _<frame_id>_\n	(the next frame's index)
...	(and so on)

The camera-to-world matrices T encode a transformation from camera-space to world-space, i.e. multiplying a camera-space position column vector p_cam with T yields a world-space position column vector p_world = T*p_cam.
The coordinate system is that of Blender: positive-X points to the right, positive-Y points upwards, positive-Z points "backwards", from the scene into the camera (right-hand rule with thumb=x, index finger=y, middle finger=z).
The right stereo view's camera is translated by 1.0 Blender units (this is the "baseline"), with no rotation relative to the left view's camera. 7. The image origin (x=0,y=0) is located in the upper left corner, i.e. a flow vector of (x=10,y=10) points towards the lower right. 8. Non-RGB data is provided in either PFM (single channel or three channels) or PGM format, depending on value range and dimensionality. While PFM is a defined standard (think "PGM/PPM for non-integer entries"), it is not widely supported. For C++, we recommend the very excellent CImg library. For Python+NumPy, see this code snippet.

Disparity is a single-channel PFM image. Note that disparities for both views are stored as positive numbers.
Disparity change is a single-channel PFM image.
Optical flow is a three-channel PFM image. Layer 0/1 contains the flow component in horizontal/vertical image direction, while layer 2 is empty (all zeroes).
Object and material segmentation are single-channel PFM images each. All indices are integer numbers.
Motion boundaries are PGM images. Background is 0, boundary pixels are 255.

For data which depends on the direction of time (optical flow, disparity change, motion boundaries), we provide both forward and backward versions.
Please note that the frame ranges differ between scenes and datasets:

FlyingThings3D: 6–15 for every scene
Driving: 1–300 or 1–800
Monkaa: Different for every scene

The FlyingThings3D dataset is split into "TEST" and "TRAIN" parts. These two parts differ only in the assets used for rendering: All textures and all 3D model categories are entirely disjoint. However, both parts exhibit the same structure and characteristics. The "TRAIN" part is 5 times larger than the "TEST" part.
Each of these parts is itself split into three subsets A, B, and C. The same rendering asset pools were used for each subset, but the object and camera motion paths are generated with different parameter settings. As a result, motion characteristics are not uniform across subsets.
We did not use the entire FlyingThings3D dataset for DispNet, FlowNet2.0 etc.: samples with extremely difficult data were omitted. See here for a list of images which we did not use.

Frequently asked questions (FAQ)

Q: I want to use depth data, but you only provide disparity!
A: Depth is perfectly equivalent to disparity as long as you know the focal length of the camera and the baseline of the stereo rig (both are given above). You can convert disparity to depth using this formula: depth = focallength*baseline/disparity
Note that the focal length unit in this equation is pixels, not (milli)meters.
Q: I would like to render my own data. Can I get the Blender scene files?
A: We do not have the necessary licenses for all the assets that we use. We cannot redistribute them.
Q: How do I get depth in meters? What is the metric scale of the dataset?
A: No metric scale or measure exists for these datasets. The focal length is listed as "35mm", but this is only within Blender. It cannot be extrapolated to depth measures in the dataset.
Q: What is the disparity/depth range of the dataset?
A: There is no feasible fixed range. If you normalize the values, expect extreme outliers.
Q: Are semantic segmentation labels / class names available?
A: No, these datasets have no notion of semantics. The labels are random across objects and scenes without any consistency guarantees. The labels are essentially meaningless.
Q: Where are the "0016.png" images?
A: The "0015.pfm" flow files in "into_future" direction (and "0006.pfm" in "into_past" direction) describe optical flows, but their target images ("0016.png"/"0005.png") are not included in the dataset. If you want data for supervised flow training, please ignore these files.
Q: There are NaN values in the data!
A: Yes, there are sporadic NaN pixels. The data generation process is unfortunately not perfect.
Q: Can I purchase a license for commercial use?
A: Unfortunately, this is not possible. We do not own the necessary licenses for the 3D models and textures which we used.
Q: The download does not work. What can I do?
A: In case you have problems with downloading the data, contact Johannes Dienert for further support.

Changelog

12Oct2018: Started adding torrent downloads
20Aug2018: Added downloads for FlyingThings3D subset used in our papers (md5 checksums)
20Jul2018: Added list of unused samples
13Oct2016: Added sample pack, FAQ section
05Oct2016: Updated md5 checksums for camera extrinsics files
17Aug2016: Uploaded fixed camera extrinsics
02May2016: Added md5 checksums
28Apr2016: Fixed download links for Driving, Monkaa subsets
25Apr2016: Fixed intrinsics matrices
XXApr2016: Initial release of most data.