GitHub - lightly-ai/lightly-train: LightlyTrain is the first PyTorch framework to pretrain computer vision models on unlabeled data for industrial applications (original) (raw)

LightlyTrain Logo

Google Colab Python OS Docker Documentation Discord

Train Better Models, Faster - No Labels Needed

LightlyTrain brings self-supervised pretraining to real-world computer vision pipelines, using your unlabeled data to reduce labeling costs and speed up model deployment. Leveraging the state-of-the-art from research, it pretrains your model on your unlabeled, domain-specific data, significantly reducing the amount of labeling needed to reach a high model performance.

This allows you to focus on new features and domains instead of managing your labeling cycles. LightlyTrain is designed for simple integration into existing training pipelines and supports a wide range of model architectures and use cases out of the box.

News

Why LightlyTrain

Benchmark Results

On COCO, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN. See our announcement post for more details.

How It Works Google Colab

Install LightlyTrain:

pip install lightly-train

Then start pretraining with:

import lightly_train

if name == "main": lightly_train.train( out="out/my_experiment", # Output directory data="my_data_dir", # Directory with images model="torchvision/resnet50", # Model to train )

This will pretrain a Torchvision ResNet-50 model using unlabeled images from my_data_dir. All training logs, model exports, and checkpoints are saved to the output directory at out/my_experiment. The final model is exported to out/my_experiment/exported_models/exported_last.pt.

Finally, load the pretrained model and fine-tune it using your existing training pipeline:

import torch from torchvision import models

Load the pretrained model

model = models.resnet50() model.load_state_dict(torch.load("out/my_experiment/exported_models/exported_last.pt", weights_only=True))

Fine-tune the model with your existing training pipeline

...

See also:

Features

Supported Models

Framework Supported Models Docs
Torchvision ResNet, ConvNext, ShuffleNetV2 πŸ”—
TIMM All models πŸ”—
Ultralytics YOLOv5, YOLOv6, YOLOv8, YOLO11, YOLO12 πŸ”—
RT-DETR RT-DETR πŸ”—
RF-DETR RF-DETR πŸ”—
YOLOv12 YOLOv12 πŸ”—
SuperGradients PP-LiteSeg, SSD, YOLO-NAS πŸ”—
Custom Models Any PyTorch model πŸ”—

For an overview of all supported models and usage instructions, see the full model docs.

Contact us if you need support for additional models or libraries.

Supported Training Methods

See the full methods docs for details.

FAQ

Who is LightlyTrain for?

LightlyTrain is designed for engineers and teams who want to use their unlabeled data to its full potential. It is ideal if any of the following applies to you:

We recommend a minimum of several thousand unlabeled images for training with LightlyTrain and 100+ labeled images for fine-tuning afterwards.

For best results:

The unlabeled dataset must always be treated like a training splitβ€”never include validation images in pretraining to avoid data leakage.

What's the difference between LightlyTrain and other self-supervised learning implementations?

LightlyTrain offers several advantages:

LightlyTrain is most beneficial when:

LightlyTrain is complementary to existing pretrained models and can start from either random weights or existing pretrained weights.

Check our complete FAQ for more information.

License

LightlyTrain offers flexible licensing options to suit your specific needs:

We're committed to supporting both open-source and commercial users.Contact us to discuss the best licensing option for your project!

Contact

Website
Discord
GitHub
X
LinkedIn