Add New Datasets — MMSegmentation 1.2.2 documentation (original) (raw)
Add new custom dataset¶
Here we show how to develop a new custom dataset.
- Create a new file
mmseg/datasets/example.py
from mmseg.registry import DATASETS
from .basesegdataset import BaseSegDataset
@DATASETS.register_module()
class ExampleDataset(BaseSegDataset):
METAINFO = dict(
classes=('xxx', 'xxx', ...),
palette=[[x, x, x], [x, x, x], ...])
def init(self, arg1, arg2):
pass - Import the module in
mmseg/datasets/__init__.py
from .example import ExampleDataset - Use it by creating a new new dataset config file
configs/_base_/datasets/example_dataset.py
dataset_type = 'ExampleDataset'
data_root = 'data/example/'
... - Add dataset meta information in
mmseg/utils/class_names.py
def example_classes():
return [
'xxx', 'xxx',
...
]
def example_palette():
return [
[x, x, x], [x, x, x],
...
]
dataset_aliases ={
'example': ['example', ...],
...
}
Note: If the new dataset does not satisfy the mmseg requirements, a data preprocessing script needs to be prepared in tools/dataset_converters/
Customize datasets by reorganizing data¶
The simplest way is to convert your dataset to organize your data into folders.
An example of file structure is as followed.
├── data │ ├── my_dataset │ │ ├── img_dir │ │ │ ├── train │ │ │ │ ├── xxx{img_suffix} │ │ │ │ ├── yyy{img_suffix} │ │ │ │ ├── zzz{img_suffix} │ │ │ ├── val │ │ ├── ann_dir │ │ │ ├── train │ │ │ │ ├── xxx{seg_map_suffix} │ │ │ │ ├── yyy{seg_map_suffix} │ │ │ │ ├── zzz{seg_map_suffix} │ │ │ ├── val
A training pair will consist of the files with same suffix in img_dir/ann_dir.
Some datasets don’t release the test set or don’t release the ground truth of the test set, and we cannot evaluate models locally without the ground truth of the test set, so we set the validation set as the default test set in config files.
About how to build your own datasets or implement a new dataset class please refer to the datasets guide for more detailed information.
Note: The annotations are images of shape (H, W), the value pixel should fall in range [0, num_classes - 1]
. You may use 'P'
mode of pillow to create your annotation image with color.
Customize datasets by mixing dataset¶
MMSegmentation also supports to mix dataset for training. Currently it supports to concat, repeat and multi-image mix datasets.
Repeat dataset¶
We use RepeatDataset
as wrapper to repeat the dataset. For example, suppose the original dataset is Dataset_A
, to repeat it, the config looks like the following
dataset_A_train = dict( type='RepeatDataset', times=N, dataset=dict( # This is the original config of Dataset_A type='Dataset_A', ... pipeline=train_pipeline ) )
Concatenate dataset¶
In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
dataset_A_train = dict() dataset_B_train = dict() concatenate_dataset = dict( type='ConcatDataset', datasets=[dataset_A_train, dataset_B_train])
A more complex example that repeats Dataset_A
and Dataset_B
by N and M times, respectively, and then concatenates the repeated datasets is as the following.
dataset_A_train = dict( type='RepeatDataset', times=N, dataset=dict( type='Dataset_A', ... pipeline=train_pipeline ) ) dataset_A_val = dict( ... pipeline=test_pipeline ) dataset_A_test = dict( ... pipeline=test_pipeline ) dataset_B_train = dict( type='RepeatDataset', times=M, dataset=dict( type='Dataset_B', ... pipeline=train_pipeline ) ) train_dataloader = dict( dataset=dict( type='ConcatDataset', datasets=[dataset_A_train, dataset_B_train]))
val_dataloader = dict(dataset=dataset_A_val) test_dataloader = dict(dataset=dataset_A_test)
You can refer base dataset tutorial from mmengine for more details
Multi-image Mix Dataset¶
We use MultiImageMixDataset
as a wrapper to mix images from multiple datasets.MultiImageMixDataset
can be used by multiple images mixed data augmentation like mosaic and mixup.
An example of using MultiImageMixDataset
with Mosaic
data augmentation:
train_pipeline = [ dict(type='RandomMosaic', prob=1), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', prob=0.5), dict(type='PackSegInputs') ]
train_dataset = dict( type='MultiImageMixDataset', dataset=dict( type=dataset_type, reduce_zero_label=False, img_dir=data_root + "images/train", ann_dir=data_root + "annotations/train", pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), ] ), pipeline=train_pipeline )