COCO (original) (raw)

COCO-Seg Dataset

The COCO-Seg dataset, an extension of the COCO (Common Objects in Context) dataset, is specially designed to aid research in object instance segmentation. It uses the same images as COCO but introduces more detailed segmentation annotations. This dataset is a crucial resource for researchers and developers working on instance segmentation tasks, especially for training Ultralytics YOLO models.

COCO-Seg Pretrained Models

Model	size(pixels)	mAPbox50-95	mAPmask50-95	SpeedCPU ONNX(ms)	SpeedT4 TensorRT10(ms)	params(M)	FLOPs(B)
YOLO11n-seg	640	38.9	32.0	65.9 ± 1.1	1.8 ± 0.0	2.9	10.4
YOLO11s-seg	640	46.6	37.8	117.6 ± 4.9	2.9 ± 0.0	10.1	35.5
YOLO11m-seg	640	51.5	41.5	281.6 ± 1.2	6.3 ± 0.1	22.4	123.3
YOLO11l-seg	640	53.4	42.9	344.2 ± 3.2	7.8 ± 0.2	27.6	142.2
YOLO11x-seg	640	54.7	43.8	664.5 ± 3.2	15.8 ± 0.7	62.1	319.0

Key Features

COCO-Seg retains the original 330K images from COCO.
The dataset consists of the same 80 object categories found in the original COCO dataset.
Annotations now include more detailed instance segmentation masks for each object in the images.
COCO-Seg provides standardized evaluation metrics like mean Average Precision (mAP) for object detection, and mean Average Recall (mAR) for instance segmentation tasks, enabling effective comparison of model performance.

Dataset Structure

The COCO-Seg dataset is partitioned into three subsets:

Train2017: This subset contains 118K images for training instance segmentation models.
Val2017: This subset includes 5K images used for validation purposes during model training.
Test2017: This subset encompasses 20K images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the COCO evaluation server for performance evaluation.

Applications

COCO-Seg is widely used for training and evaluating deep learning models in instance segmentation, such as the YOLO models. The large number of annotated images, the diversity of object categories, and the standardized evaluation metrics make it an indispensable resource for computer vision researchers and practitioners.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO-Seg dataset, the coco.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml.

ultralytics/cfg/datasets/coco.yaml

`# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

COCO 2017 dataset https://cocodataset.org by Microsoft

Documentation: https://docs.ultralytics.com/datasets/detect/coco/

Example usage: yolo train data=coco.yaml

parent

├── ultralytics

└── datasets

└── coco ← downloads here (20.1 GB)

Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]

path: coco # dataset root dir train: train2017.txt # train images (relative to 'path') 118287 images val: val2017.txt # val images (relative to 'path') 5000 images test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

Classes

names: 0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 6: train 7: truck 8: boat 9: traffic light 10: fire hydrant 11: stop sign 12: parking meter 13: bench 14: bird 15: cat 16: dog 17: horse 18: sheep 19: cow 20: elephant 21: bear 22: zebra 23: giraffe 24: backpack 25: umbrella 26: handbag 27: tie 28: suitcase 29: frisbee 30: skis 31: snowboard 32: sports ball 33: kite 34: baseball bat 35: baseball glove 36: skateboard 37: surfboard 38: tennis racket 39: bottle 40: wine glass 41: cup 42: fork 43: knife 44: spoon 45: bowl 46: banana 47: apple 48: sandwich 49: orange 50: broccoli 51: carrot 52: hot dog 53: pizza 54: donut 55: cake 56: chair 57: couch 58: potted plant 59: bed 60: dining table 61: toilet 62: tv 63: laptop 64: mouse 65: remote 66: keyboard 67: cell phone 68: microwave 69: oven 70: toaster 71: sink 72: refrigerator 73: book 74: clock 75: vase 76: scissors 77: teddy bear 78: hair drier 79: toothbrush

Download script/URL (optional)

download: | from pathlib import Path

from ultralytics.utils.downloads import download

Download labels

segments = True # segment or box labels dir = Path(yaml["path"]) # dataset root dir url = "https://github.com/ultralytics/assets/releases/download/v0.0.0/" urls = [url + ("coco2017labels-segments.zip" if segments else "coco2017labels.zip")] # labels download(urls, dir=dir.parent)

Download data

urls = [ "http://images.cocodataset.org/zips/train2017.zip", # 19G, 118k images "http://images.cocodataset.org/zips/val2017.zip", # 1G, 5k images "http://images.cocodataset.org/zips/test2017.zip", # 7G, 41k images (optional) ] download(urls, dir=dir / "images", threads=3) `

Usage

To train a YOLO11n-seg model on the COCO-Seg dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

Train Example

PythonCLI

`from ultralytics import YOLO

Load a model

model = YOLO("yolo11n-seg.pt") # load a pretrained model (recommended for training)

Train the model

results = model.train(data="coco.yaml", epochs=100, imgsz=640) `

# Start training from a pretrained *.pt model yolo segment train data=coco.yaml model=yolo11n-seg.pt epochs=100 imgsz=640

Sample Images and Annotations

COCO-Seg, like its predecessor COCO, contains a diverse set of images with various object categories and complex scenes. However, COCO-Seg introduces more detailed instance segmentation masks for each object in the images. Here are some examples of images from the dataset, along with their corresponding instance segmentation masks:

Dataset sample image

Mosaiced Image: This image demonstrates a training batch composed of mosaiced dataset images. Mosaicing is a technique used during training that combines multiple images into a single image to increase the variety of objects and scenes within each training batch. This aids the model's ability to generalize to different object sizes, aspect ratios, and contexts.

The example showcases the variety and complexity of the images in the COCO-Seg dataset and the benefits of using mosaicing during the training process.

Citations and Acknowledgments

If you use the COCO-Seg dataset in your research or development work, please cite the original COCO paper and acknowledge the extension to COCO-Seg:

BibTeX

@misc{lin2015microsoft, title={Microsoft COCO: Common Objects in Context}, author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, year={2015}, eprint={1405.0312}, archivePrefix={arXiv}, primaryClass={cs.CV} }

We extend our thanks to the COCO Consortium for creating and maintaining this invaluable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the COCO dataset website.

FAQ

What is the COCO-Seg dataset and how does it differ from the original COCO dataset?

The COCO-Seg dataset is an extension of the original COCO (Common Objects in Context) dataset, specifically designed for instance segmentation tasks. While it uses the same images as the COCO dataset, COCO-Seg includes more detailed segmentation annotations, making it a powerful resource for researchers and developers focusing on object instance segmentation.

How can I train a YOLO11 model using the COCO-Seg dataset?

To train a YOLO11n-seg model on the COCO-Seg dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a detailed list of available arguments, refer to the model Training page.

Train Example

PythonCLI

`from ultralytics import YOLO

Load a model

model = YOLO("yolo11n-seg.pt") # load a pretrained model (recommended for training)

Train the model

results = model.train(data="coco.yaml", epochs=100, imgsz=640) `

# Start training from a pretrained *.pt model yolo segment train data=coco.yaml model=yolo11n-seg.pt epochs=100 imgsz=640

What are the key features of the COCO-Seg dataset?

The COCO-Seg dataset includes several key features:

Retains the original 330K images from the COCO dataset.
Annotates the same 80 object categories found in the original COCO.
Provides more detailed instance segmentation masks for each object.
Uses standardized evaluation metrics such as mean Average Precision (mAP) for object detection and mean Average Recall (mAR) for instance segmentation tasks.

What pretrained models are available for COCO-Seg, and what are their performance metrics?

The COCO-Seg dataset supports multiple pretrained YOLO11 segmentation models with varying performance metrics. Here's a summary of the available models and their key metrics:

Model	size(pixels)	mAPbox50-95	mAPmask50-95	SpeedCPU ONNX(ms)	SpeedT4 TensorRT10(ms)	params(M)	FLOPs(B)
YOLO11n-seg	640	38.9	32.0	65.9 ± 1.1	1.8 ± 0.0	2.9	10.4
YOLO11s-seg	640	46.6	37.8	117.6 ± 4.9	2.9 ± 0.0	10.1	35.5
YOLO11m-seg	640	51.5	41.5	281.6 ± 1.2	6.3 ± 0.1	22.4	123.3
YOLO11l-seg	640	53.4	42.9	344.2 ± 3.2	7.8 ± 0.2	27.6	142.2
YOLO11x-seg	640	54.7	43.8	664.5 ± 3.2	15.8 ± 0.7	62.1	319.0

These models range from the lightweight YOLO11n-seg to the more powerful YOLO11x-seg, offering different trade-offs between speed and accuracy to suit various application requirements. For more information on model selection, visit the Ultralytics models page.

How is the COCO-Seg dataset structured and what subsets does it contain?

The COCO-Seg dataset is partitioned into three subsets for specific training and evaluation needs:

Train2017: Contains 118K images used primarily for training instance segmentation models.
Val2017: Comprises 5K images utilized for validation during the training process.
Test2017: Encompasses 20K images reserved for testing and benchmarking trained models. Note that ground truth annotations for this subset are not publicly available, and performance results are submitted to the COCO evaluation server for assessment.

For smaller experimentation needs, you might also consider using the COCO8-seg dataset, which is a compact version containing just 8 images from the COCO train 2017 set.