COCO (original) (raw)

COCO Dataset

The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.

Watch: Ultralytics COCO Dataset Overview

COCO Pretrained Models

Model size(pixels) mAPval50-95 SpeedCPU ONNX(ms) SpeedT4 TensorRT10(ms) params(M) FLOPs(B)
YOLO11n 640 39.5 56.1 ± 0.8 1.5 ± 0.0 2.6 6.5
YOLO11s 640 47.0 90.0 ± 1.2 2.5 ± 0.0 9.4 21.5
YOLO11m 640 51.5 183.2 ± 2.0 4.7 ± 0.1 20.1 68.0
YOLO11l 640 53.4 238.6 ± 1.4 6.2 ± 0.1 25.3 86.9
YOLO11x 640 54.7 462.8 ± 6.7 11.3 ± 0.2 56.9 194.9

Key Features

Dataset Structure

The COCO dataset is split into three subsets:

  1. Train2017: This subset contains 118K images for training object detection, segmentation, and captioning models.
  2. Val2017: This subset has 5K images used for validation purposes during model training.
  3. Test2017: This subset consists of 20K images used for testing and benchmarking the trained models. Ground truth annotations for this subset are not publicly available, and the results are submitted to the COCO evaluation server for performance evaluation.

Applications

The COCO dataset is widely used for training and evaluating deep learning models in object detection (such as Ultralytics YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN), and keypoint detection (such as OpenPose). The dataset's diverse set of object categories, large number of annotated images, and standardized evaluation metrics make it an essential resource for computer vision researchers and practitioners.

Dataset YAML

A YAML (Yet Another Markup Language) file is used to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. In the case of the COCO dataset, the coco.yaml file is maintained at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco.yaml.

ultralytics/cfg/datasets/coco.yaml

`# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

COCO 2017 dataset https://cocodataset.org by Microsoft

Documentation: https://docs.ultralytics.com/datasets/detect/coco/

Example usage: yolo train data=coco.yaml

parent

├── ultralytics

└── datasets

└── coco ← downloads here (20.1 GB)

Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]

path: ../datasets/coco # dataset root dir train: train2017.txt # train images (relative to 'path') 118287 images val: val2017.txt # val images (relative to 'path') 5000 images test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

Classes

names: 0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 6: train 7: truck 8: boat 9: traffic light 10: fire hydrant 11: stop sign 12: parking meter 13: bench 14: bird 15: cat 16: dog 17: horse 18: sheep 19: cow 20: elephant 21: bear 22: zebra 23: giraffe 24: backpack 25: umbrella 26: handbag 27: tie 28: suitcase 29: frisbee 30: skis 31: snowboard 32: sports ball 33: kite 34: baseball bat 35: baseball glove 36: skateboard 37: surfboard 38: tennis racket 39: bottle 40: wine glass 41: cup 42: fork 43: knife 44: spoon 45: bowl 46: banana 47: apple 48: sandwich 49: orange 50: broccoli 51: carrot 52: hot dog 53: pizza 54: donut 55: cake 56: chair 57: couch 58: potted plant 59: bed 60: dining table 61: toilet 62: tv 63: laptop 64: mouse 65: remote 66: keyboard 67: cell phone 68: microwave 69: oven 70: toaster 71: sink 72: refrigerator 73: book 74: clock 75: vase 76: scissors 77: teddy bear 78: hair drier 79: toothbrush

Download script/URL (optional)

download: | from pathlib import Path

from ultralytics.utils.downloads import download

Download labels

segments = True # segment or box labels dir = Path(yaml["path"]) # dataset root dir url = "https://github.com/ultralytics/assets/releases/download/v0.0.0/" urls = [url + ("coco2017labels-segments.zip" if segments else "coco2017labels.zip")] # labels download(urls, dir=dir.parent)

Download data

urls = [ "http://images.cocodataset.org/zips/train2017.zip", # 19G, 118k images "http://images.cocodataset.org/zips/val2017.zip", # 1G, 5k images "http://images.cocodataset.org/zips/test2017.zip", # 7G, 41k images (optional) ] download(urls, dir=dir / "images", threads=3) `

Usage

To train a YOLO11n model on the COCO dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.

Train Example

PythonCLI

`from ultralytics import YOLO

Load a model

model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)

Train the model

results = model.train(data="coco.yaml", epochs=100, imgsz=640) `

# Start training from a pretrained *.pt model yolo detect train data=coco.yaml model=yolo11n.pt epochs=100 imgsz=640

Sample Images and Annotations

The COCO dataset contains a diverse set of images with various object categories and complex scenes. Here are some examples of images from the dataset, along with their corresponding annotations:

Dataset sample image

The example showcases the variety and complexity of the images in the COCO dataset and the benefits of using mosaicing during the training process.

Citations and Acknowledgments

If you use the COCO dataset in your research or development work, please cite the following paper:

BibTeX

@misc{lin2015microsoft, title={Microsoft COCO: Common Objects in Context}, author={Tsung-Yi Lin and Michael Maire and Serge Belongie and Lubomir Bourdev and Ross Girshick and James Hays and Pietro Perona and Deva Ramanan and C. Lawrence Zitnick and Piotr Dollár}, year={2015}, eprint={1405.0312}, archivePrefix={arXiv}, primaryClass={cs.CV} }

We would like to acknowledge the COCO Consortium for creating and maintaining this valuable resource for the computer vision community. For more information about the COCO dataset and its creators, visit the COCO dataset website.

FAQ

What is the COCO dataset and why is it important for computer vision?

The COCO dataset (Common Objects in Context) is a large-scale dataset used for object detection, segmentation, and captioning. It contains 330K images with detailed annotations for 80 object categories, making it essential for benchmarking and training computer vision models. Researchers use COCO due to its diverse categories and standardized evaluation metrics like mean Average Precision (mAP).

How can I train a YOLO model using the COCO dataset?

To train a YOLO11 model using the COCO dataset, you can use the following code snippets:

Train Example

PythonCLI

`from ultralytics import YOLO

Load a model

model = YOLO("yolo11n.pt") # load a pretrained model (recommended for training)

Train the model

results = model.train(data="coco.yaml", epochs=100, imgsz=640) `

# Start training from a pretrained *.pt model yolo detect train data=coco.yaml model=yolo11n.pt epochs=100 imgsz=640

Refer to the Training page for more details on available arguments.

What are the key features of the COCO dataset?

The COCO dataset includes:

Where can I find pretrained YOLO11 models trained on the COCO dataset?

Pretrained YOLO11 models on the COCO dataset can be downloaded from the links provided in the documentation. Examples include:

These models vary in size, mAP, and inference speed, providing options for different performance and resource requirements.

How is the COCO dataset structured and how do I use it?

The COCO dataset is split into three subsets:

  1. Train2017: 118K images for training.
  2. Val2017: 5K images for validation during training.
  3. Test2017: 20K images for benchmarking trained models. Results need to be submitted to the COCO evaluation server for performance evaluation.

The dataset's YAML configuration file is available at coco.yaml, which defines paths, classes, and dataset details.