Computer Vision Group, Freiburg (original) (raw)

Home

Uni-Logo

Datasets


Scene Flow Datasets: FlyingThings3D, Driving, Monkaa

This dataset collection has been used to train convolutional networks in our CVPR 2016 paper A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Here, we make all generated data freely available.

Terms of use

This dataset is provided for research purposes only and without any warranty. Any commercial use is prohibited. If you use the dataset or parts of it in your research, you should cite the aforementioned paper:

@InProceedings{MIFDB16, author = "N. Mayer and E. Ilg and P. H{"a}usser and P. Fischer and D. Cremers and A. Dosovitskiy and T. Brox", title = "A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation", booktitle = "IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)", year = "2016", note = "arXiv:1512.02134", url = "http://lmb.informatik.uni-freiburg.de/Publications/2016/MIFDB16" }

Overview

The collection contains more than 39000 stereo frames in 960x540 pixel resolution, rendered from various synthetic sequences. For details on the characteristics and differences of the three subsets, we refer the reader to our paper. The following kinds of data are currently available:

Left view Right view Explanation
RGB cleanpass RGB finalpass RGB stereo renderings: Rendered images are available in cleanpass and finalpass versions (the latter with more realistic—but also more difficult—effects such as motion blur and depth of field). Both versions can be downloaded as lossless PNG or high-quality lossy WebP images.
material segmentation material segmentation Segmentations: Object-level and material-level segmentation images.
optical flow optical flow Optical flow maps: The optical flow describes how pixels move between images (here, between time steps in a sequence). It is the projected screenspace component of full scene flow, and used in many computer vision applications.
disparity disparity Disparity maps: Disparity here describes how pixels move between the two views of a stereo frame. It is a formulation of depth which is independent of camera intrinsics (although it depends on the configuration of the stereo rig), and can be seen as a special case of optical flow.
disparity change disparity change Disparity change maps: Disparity alone is only valid for a single stereo frame. In image sequences, pixel disparities change with time. This disparity change data fills the gaps in scene flow that occur when one uses only optical flow and static disparity.
motion boundaries motion boundaries Motion boundaries: Motion boundaries divide an image into regions with significantly different motion. They can be used to better judge the performance of an algorithm at discontinuities.
Camera data: Full intrinsic and extrinsic camera data is available for each view of every stereo frame in our dataset collection.

Downloads

Example pack

Want to get your feet wet? Wish to check out our data without downloading gigabytes of archives first? Then get our sample pack!

Full datasets

Downloading via torrent

Where applicable, please download our datasets via BitTorrent.

FlyingThings3D Driving Monkaa
Raw data
RGB images (cleanpass) PNG: .torrent (37GB) WebP: .torrent (7.4GB) PNG: .torrent (6.3GB) WebP: .torrent (1.5GB) PNG: .tar (9.1GB) WebP: .tar (1.8GB)
RGB images (finalpass) PNG: .torrent (43GB) WebP: .torrent (5.7GB) PNG: .torrent (6.1GB) WebP: .torrent (926MB) PNG: .tar (17GB) WebP: [disabled]
Camera data .tar (15MB) .tar (1.8MB) .tar (3.7MB)
Object segmentation .tar.bz2 (409MB, unzipped 104GB) .tar.bz2 (78MB, unzipped 18GB) .tar.bz2 (83MB, unzipped 34GB)
Material segmentation .tar.bz2 (510MB, unzipped 104GB) .tar.bz2 (170MB, unzipped 18GB) .tar.bz2 (115MB, unzipped 34GB)
Derived data
Disparity .torrent (87GB, unzipped 104GB) .torrent (9GB, unzipped 18GB) .tar.bz2 (28GB, unzipped 34GB)
Disparity change .torrent (116GB, unzipped 208GB) .torrent (22GB, unzipped 35GB) .tar.bz2 (35GB, unzipped 68GB)
Optical flow .torrent (311GB, unzipped 621GB) .torrent (50GB, unzipped 102GB) .tar.bz2 (89GB, unzipped 201GB)
Motion boundaries .tar.bz2 (615MB, unzipped 52GB) .tar.bz2 (206MB, unzipped 8.6GB) .tar.bz2 (106MB, unzipped 17GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).

DispNet/FlowNet2.0 dataset subsets

For our network training and testing in the DispNet, FlowNet2.0 etc. papers, we omitted some extremely hard samples from the FlyingThings3D dataset. Here you can download these subsets for the modalities which we used:

FlyingThings3D subset Sequence lengths
Raw data Unlike the original datasets (above), these subset downloads are not split into individual scenes. To get the lengths of individual sequences, we provide annotations files for the train and val splits. These files contain the length (=number of samples) of one scene per line.
RGB images (cleanpass) .torrent (35GB, unzipped 35GB)
Object segmentation .tar.bz2 (570MB, unzipped 674MB)
Derived data
Disparity .torrent (5GB, unzipped 102GB)
Disparity change .torrent (2.4GB, unzipped 182GB)
Optical flow .torrent (75GB, unzipped 364GB)
Motion boundaries .tar.bz2 (979MB, unzipped 1167MB)
Motion Boundary Weights .torrent (12G, unzipped 137GB)
Disparity Occlusions .tar.bz2 (420MB, unzipped 525MB)
Disparity Occlusion Weights .torrent (9GB, unzipped 102GB)
Flow Occlusions .tar.bz2 (691MB, unzipped 889MB)
Flow Occlusion Weights .torrent (15GB, unzipped 182GB)
Depth Boundaries .tar.bz2 (654MB, unzipped 755MB)
Depth Boundary Weights .torrent (11GB, unzipped 102GB)

Bold sizes indicate that a compressed archive expands to a very much larger size (more than 100GB larger, or expansion factor > 10).

Data formats and organization

  1. Download handy Python IO routines. (Read/write .float3/.flo/.ppm/.pgm/.png/.jpg/.pfm)
  2. Use bunzip2 to decompress .tar.bz2 files, and use "tar xf <file.tar>" to unpack .tar archives. Caution, some archives expand to massively larger sizes.
  3. The RGB image packs are available in both cleanpass and finalpass settings. The cleanpass setting includes lighting and shading effects, but no additional effects. In contrast, finalpass images also contain motion blur and defocus blur.
    All RGB images are provided as both lossless PNG and lossy WebP (used in our experiments). WebP images are compressed using a quality setting of 95%, using the publicly available source code (version 0.5.0). WebP offers 80-90% smaller files than PNG, with virtually indistinguishable results.
  4. The virtual imaging sensor has a size of 32.0mmx18.0mm.
    Most scenes use a virtual focal length of 35.0mm. For those scenes, the virtual camera intrinsics matrix is given by
    fx=1050.0 0.0 cx=479.5
    0.0 fy=1050.0 cy=269.5
    0.0 0.0 1.0
    where (fx,fy) are focal lengths and (cx,cy) denotes the principal point.
    Some scenes in the Driving subset use a virtual focal length of 15.0mm (the directory structure describes this clearly). For those scenes, the intrinsics matrix is given by
    fx=450.0 0.0 cx=479.5
    -------- -------- --------
    0.0 fy=450.0 cy=269.5
    0.0 0.0 1.0
    Please note that due to Blender's coordinate system convention (see below), the focal length values (fx,fy) really should be negative numbers. Here we list the positive numbers because in practise this catch is only important when working on the raw 3D data.
  5. All data comes in a stereo setting, i.e. there are "left" and "right" subfolders for everything. The obligatory exception to this rule is the camera data where everything is stored in a single (small) text file per scene.
  6. Camera extrinsics data is stored as follows: Each camera_data.txt file contains the following entry for each frame of its scene:
    ...
    Frame _<frame_id>_\n frame_id is the frame index. All images and data files for this frame carry this name, as a four-digit number with leading zeroes for padding.
    L _T00 T01 T02 T03 T10 ... T33_\n Camera-to-world 4x4 matrix for the left view of the stereo pair in row-major order, i.e. (T00 T01 T02 T03) encodes the uppermost row from left to right.
    R _T00 T01 T02 T03 T10 ... T33_\n Ditto for the right view of the stereo pair.
    \n (an empty line)
    Frame _<frame_id>_\n (the next frame's index)
    ... (and so on)

The camera-to-world matrices T encode a transformation from camera-space to world-space, i.e. multiplying a camera-space position column vector p_cam with T yields a world-space position column vector p_world = T*p_cam.
The coordinate system is that of Blender: positive-X points to the right, positive-Y points upwards, positive-Z points "backwards", from the scene into the camera (right-hand rule with thumb=x, index finger=y, middle finger=z).
The right stereo view's camera is translated by 1.0 Blender units (this is the "baseline"), with no rotation relative to the left view's camera. 7. The image origin (x=0,y=0) is located in the upper left corner, i.e. a flow vector of (x=10,y=10) points towards the lower right. 8. Non-RGB data is provided in either PFM (single channel or three channels) or PGM format, depending on value range and dimensionality. While PFM is a defined standard (think "PGM/PPM for non-integer entries"), it is not widely supported. For C++, we recommend the very excellent CImg library. For Python+NumPy, see this code snippet.

  1. For data which depends on the direction of time (optical flow, disparity change, motion boundaries), we provide both forward and backward versions.
  2. Please note that the frame ranges differ between scenes and datasets:
  1. The FlyingThings3D dataset is split into "TEST" and "TRAIN" parts. These two parts differ only in the assets used for rendering: All textures and all 3D model categories are entirely disjoint. However, both parts exhibit the same structure and characteristics. The "TRAIN" part is 5 times larger than the "TEST" part.
    Each of these parts is itself split into three subsets A, B, and C. The same rendering asset pools were used for each subset, but the object and camera motion paths are generated with different parameter settings. As a result, motion characteristics are not uniform across subsets.
  2. We did not use the entire FlyingThings3D dataset for DispNet, FlowNet2.0 etc.: samples with extremely difficult data were omitted. See here for a list of images which we did not use.

Frequently asked questions (FAQ)

Changelog