GitHub - sunsmarterjie/yolov12: YOLOv12: Attention-Centric Real-Time Object Detectors (original) (raw)

YOLOv12

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian1, Qixiang Ye2, David Doermann1

1 University at Buffalo, SUNY, 2 University of Chinese Academy of Sciences.


Comparison with popular methods in terms of latency-accuracy (left) and FLOPs-accuracy (right) trade-offs

arXiv Hugging Face Demo Open In Colab Kaggle Notebook LightlyTrain Notebook deploy Openbayes

Updates

YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters.

Main Results

Turbo (default version):

Model size(pixels) mAPval50-95 SpeedT4 TensorRT10 params(M) FLOPs(G)
YOLO12n 640 40.4 1.60 2.5 6.0
YOLO12s 640 47.6 2.42 9.1 19.4
YOLO12m 640 52.5 4.27 19.6 59.8
YOLO12l 640 53.8 5.83 26.5 82.4
YOLO12x 640 55.4 10.38 59.3 184.6

v1.0:

Model size(pixels) mAPval50-95 SpeedT4 TensorRT10 params(M) FLOPs(G)
YOLO12n 640 40.6 1.64 2.6 6.5
YOLO12s 640 48.0 2.61 9.3 21.4
YOLO12m 640 52.5 4.86 20.2 67.5
YOLO12l 640 53.7 6.77 26.4 88.9
YOLO12x 640 55.2 11.79 59.1 199.0

Installation

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov12 python=3.11
conda activate yolov12
pip install -r requirements.txt
pip install -e .

Validation

yolov12n yolov12s yolov12m yolov12l yolov12x

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt') model.val(data='coco.yaml', save_json=True)

Training

from ultralytics import YOLO

model = YOLO('yolov12n.yaml')

Train the model

results = model.train( data='coco.yaml', epochs=600, batch=256, imgsz=640, scale=0.5, # S:0.9; M:0.9; L:0.9; X:0.9 mosaic=1.0, mixup=0.0, # S:0.05; M:0.15; L:0.15; X:0.2 copy_paste=0.1, # S:0.15; M:0.4; L:0.5; X:0.6 device="0,1,2,3", )

Evaluate model performance on the validation set

metrics = model.val()

Perform object detection on an image

results = model("path/to/image.jpg") results[0].show()

Prediction

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt') model.predict()

Export

from ultralytics import YOLO

model = YOLO('yolov12{n/s/m/l/x}.pt') model.export(format="engine", half=True) # or format="onnx"

Demo

python app.py
# Please visit http://127.0.0.1:7860

Acknowledgement

The code is based on ultralytics. Thanks for their excellent work!

Citation

@article{tian2025yolov12, title={YOLOv12: Attention-Centric Real-Time Object Detectors}, author={Tian, Yunjie and Ye, Qixiang and Doermann, David}, journal={arXiv preprint arXiv:2502.12524}, year={2025} }