Datasets — Torchvision 0.22 documentation (original) (raw)
Torchvision provides many built-in datasets in the torchvision.datasets
module, as well as utility classes for building your own datasets.
Built-in datasets¶
All datasets are subclasses of torch.utils.data.Dataseti.e, they have __getitem__
and __len__
methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoaderwhich can load multiple samples in parallel using torch.multiprocessing
workers. For example:
imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/') data_loader = torch.utils.data.DataLoader(imagenet_data, batch_size=4, shuffle=True, num_workers=args.nThreads)
All the datasets have almost similar API. They all have two common arguments:transform
and target_transform
to transform the input and target respectively. You can also create your own datasets using the provided base classes.
Warning
When a dataset object is created with download=True
, the files are first downloaded and extracted in the root directory. This download logic is not multi-process safe, so it may lead to conflicts / race conditions if it is run within a distributed setting. In distributed mode, we recommend creating a dummy dataset object to trigger the download logic before setting up distributed mode.
Image classification¶
Caltech101(root[, target_type, transform, ...]) | Caltech 101 Dataset. |
---|---|
Caltech256(root[, transform, ...]) | Caltech 256 Dataset. |
CelebA(root[, split, target_type, ...]) | Large-scale CelebFaces Attributes (CelebA) Dataset Dataset. |
CIFAR10(root[, train, transform, ...]) | CIFAR10 Dataset. |
CIFAR100(root[, train, transform, ...]) | CIFAR100 Dataset. |
Country211(root, ~pathlib.Path], split, ...) | The Country211 Data Set from OpenAI. |
DTD(root, ~pathlib.Path], split, partition, ...) | Describable Textures Dataset (DTD). |
EMNIST(root, split, **kwargs) | EMNIST Dataset. |
EuroSAT(root, ~pathlib.Path], transform, ...) | RGB version of the EuroSAT Dataset. |
FakeData([size, image_size, num_classes, ...]) | A fake dataset that returns randomly generated images and returns them as PIL images |
FashionMNIST(root[, train, transform, ...]) | Fashion-MNIST Dataset. |
FER2013(root[, split, transform, ...]) | FER2013 Dataset. |
FGVCAircraft(root, ~pathlib.Path], split, ...) | FGVC Aircraft Dataset. |
Flickr8k(root, ~pathlib.Path], ann_file, ...) | Flickr8k Entities Dataset. |
Flickr30k(root, ann_file, transform, ...) | Flickr30k Entities Dataset. |
Flowers102(root, ~pathlib.Path], split, ...) | Oxford 102 Flower Dataset. |
Food101(root, ~pathlib.Path], split, ...) | The Food-101 Data Set. |
GTSRB(root[, split, transform, ...]) | German Traffic Sign Recognition Benchmark (GTSRB) Dataset. |
INaturalist(root[, version, target_type, ...]) | iNaturalist Dataset. |
ImageNet(root[, split]) | ImageNet 2012 Classification Dataset. |
Imagenette(root, ~pathlib.Path], split, size) | Imagenette image classification dataset. |
KMNIST(root[, train, transform, ...]) | Kuzushiji-MNIST Dataset. |
LFWPeople(root, split, image_set, transform, ...) | LFW Dataset. |
LSUN(root[, classes, transform, ...]) | LSUN dataset. |
MNIST(root[, train, transform, ...]) | MNIST Dataset. |
Omniglot(root[, background, transform, ...]) | Omniglot Dataset. |
OxfordIIITPet(root[, split, target_types, ...]) | Oxford-IIIT Pet Dataset. |
Places365(root, ~pathlib.Path], split, ...) | Places365 classification dataset. |
PCAM(root[, split, transform, ...]) | PCAM Dataset. |
QMNIST(root[, what, compat, train]) | QMNIST Dataset. |
RenderedSST2(root, ~pathlib.Path], split, ...) | The Rendered SST2 Dataset. |
SEMEION(root[, transform, target_transform, ...]) | SEMEION Dataset. |
SBU(root, ~pathlib.Path], transform, ...) | SBU Captioned Photo Dataset. |
StanfordCars(root, ~pathlib.Path], split, ...) | Stanford Cars Dataset |
STL10(root[, split, folds, transform, ...]) | STL10 Dataset. |
SUN397(root, ~pathlib.Path], transform, ...) | The SUN397 Data Set. |
SVHN(root[, split, transform, ...]) | SVHN Dataset. |
USPS(root[, train, transform, ...]) | USPS Dataset. |
Image detection or segmentation¶
CocoDetection(root, annFile[, transform, ...]) | MS Coco Detection Dataset. |
---|---|
CelebA(root[, split, target_type, ...]) | Large-scale CelebFaces Attributes (CelebA) Dataset Dataset. |
Cityscapes(root[, split, mode, target_type, ...]) | Cityscapes Dataset. |
Kitti(root[, train, transform, ...]) | KITTI Dataset. |
OxfordIIITPet(root[, split, target_types, ...]) | Oxford-IIIT Pet Dataset. |
SBDataset(root[, image_set, mode, download, ...]) | Semantic Boundaries Dataset |
VOCSegmentation(root[, year, image_set, ...]) | Pascal VOC Segmentation Dataset. |
VOCDetection(root[, year, image_set, ...]) | Pascal VOC Detection Dataset. |
WIDERFace(root[, split, transform, ...]) | WIDERFace Dataset. |
Optical Flow¶
FlyingChairs(root[, split, transforms]) | FlyingChairs Dataset for optical flow. |
---|---|
FlyingThings3D(root, ~pathlib.Path], split, ...) | FlyingThings3D dataset for optical flow. |
HD1K(root, ~pathlib.Path], split, ...) | HD1K dataset for optical flow. |
KittiFlow(root, ~pathlib.Path], split, ...) | KITTI dataset for optical flow (2015). |
Sintel(root, ~pathlib.Path], split, ...) | Sintel Dataset for optical flow. |
Stereo Matching¶
CarlaStereo(root[, transforms]) | Carla simulator data linked in the CREStereo github repo. |
---|---|
Kitti2012Stereo(root[, split, transforms]) | KITTI dataset from the 2012 stereo evaluation benchmark. |
Kitti2015Stereo(root[, split, transforms]) | KITTI dataset from the 2015 stereo evaluation benchmark. |
CREStereo(root[, transforms]) | Synthetic dataset used in training the CREStereo architecture. |
FallingThingsStereo(root[, variant, transforms]) | FallingThings dataset. |
SceneFlowStereo(root[, variant, pass_name, ...]) | Dataset interface for Scene Flow datasets. |
SintelStereo(root[, pass_name, transforms]) | Sintel Stereo Dataset. |
InStereo2k(root[, split, transforms]) | InStereo2k dataset. |
ETH3DStereo(root[, split, transforms]) | ETH3D Low-Res Two-View dataset. |
Middlebury2014Stereo(root[, split, ...]) | Publicly available scenes from the Middlebury dataset 2014 version https://vision.middlebury.edu/stereo/data/scenes2014/. |
Image pairs¶
Image captioning¶
Video classification¶
Video prediction¶
Base classes for custom datasets¶
DatasetFolder(root, loader[, extensions, ...]) | A generic data loader. |
---|---|
ImageFolder(root, ~pathlib.Path], transform, ...) | A generic data loader where the images are arranged in this way by default: . |
VisionDataset([root, transforms, transform, ...]) | Base Class For making datasets which are compatible with torchvision. |