Unconditional image generation (original) (raw)

Unconditional image generation models are not conditioned on text or images during training. It only generates images that resemble its training data distribution.

This guide will explore the train_unconditional.py training script to help you become familiar with it, and how you can adapt it for your own use-case.

Before running the script, make sure you install the library from source:

git clone https://github.com/huggingface/diffusers cd diffusers pip install .

Then navigate to the example folder containing the training script and install the required dependencies:

cd examples/unconditional_image_generation pip install -r requirements.txt

🤗 Accelerate is a library for helping you train on multiple GPUs/TPUs or with mixed-precision. It’ll automatically configure your training setup based on your hardware and environment. Take a look at the 🤗 Accelerate Quick tour to learn more.

Initialize an 🤗 Accelerate environment:

To setup a default 🤗 Accelerate environment without choosing any configurations:

accelerate config default

Or if your environment doesn’t support an interactive shell like a notebook, you can use:

from accelerate.utils import write_basic_config

write_basic_config()

Lastly, if you want to train a model on your own dataset, take a look at the Create a dataset for training guide to learn how to create a dataset that works with the training script.

Script parameters

The following sections highlight parts of the training script that are important for understanding how to modify it, but it doesn’t cover every aspect of the script in detail. If you’re interested in learning more, feel free to read through the script and let us know if you have any questions or concerns.

The training script provides many parameters to help you customize your training run. All of the parameters and their descriptions are found in the parse_args() function. It provides default values for each parameter, such as the training batch size and learning rate, but you can also set your own values in the training command if you’d like.

For example, to speedup training with mixed precision using the bf16 format, add the --mixed_precision parameter to the training command:

accelerate launch train_unconditional.py
--mixed_precision="bf16"

Some basic and important parameters to specify include:

Bring your dataset, and let the training script handle everything else!

Training script

The code for preprocessing the dataset and the training loop is found in the main() function. If you need to adapt the training script, this is where you’ll need to make your changes.

The train_unconditional script initializes a UNet2DModel if you don’t provide a model configuration. You can configure the UNet here if you’d like:

model = UNet2DModel( sample_size=args.resolution, in_channels=3, out_channels=3, layers_per_block=2, block_out_channels=(128, 128, 256, 256, 512, 512), down_block_types=( "DownBlock2D", "DownBlock2D", "DownBlock2D", "DownBlock2D", "AttnDownBlock2D", "DownBlock2D", ), up_block_types=( "UpBlock2D", "AttnUpBlock2D", "UpBlock2D", "UpBlock2D", "UpBlock2D", "UpBlock2D", ), )

Next, the script initializes a scheduler and optimizer:

accepts_prediction_type = "prediction_type" in set(inspect.signature(DDPMScheduler.init).parameters.keys()) if accepts_prediction_type: noise_scheduler = DDPMScheduler( num_train_timesteps=args.ddpm_num_steps, beta_schedule=args.ddpm_beta_schedule, prediction_type=args.prediction_type, ) else: noise_scheduler = DDPMScheduler(num_train_timesteps=args.ddpm_num_steps, beta_schedule=args.ddpm_beta_schedule)

optimizer = torch.optim.AdamW( model.parameters(), lr=args.learning_rate, betas=(args.adam_beta1, args.adam_beta2), weight_decay=args.adam_weight_decay, eps=args.adam_epsilon, )

Then it loads a dataset and you can specify how to preprocess it:

dataset = load_dataset("imagefolder", data_dir=args.train_data_dir, cache_dir=args.cache_dir, split="train")

augmentations = transforms.Compose( [ transforms.Resize(args.resolution, interpolation=transforms.InterpolationMode.BILINEAR), transforms.CenterCrop(args.resolution) if args.center_crop else transforms.RandomCrop(args.resolution), transforms.RandomHorizontalFlip() if args.random_flip else transforms.Lambda(lambda x: x), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]), ] )

Finally, the training loop handles everything else such as adding noise to the images, predicting the noise residual, calculating the loss, saving checkpoints at specified steps, and saving and pushing the model to the Hub. If you want to learn more about how the training loop works, check out the Understanding pipelines, models and schedulers tutorial which breaks down the basic pattern of the denoising process.

Launch the script

Once you’ve made all your changes or you’re okay with the default configuration, you’re ready to launch the training script! 🚀

A full training run takes 2 hours on 4xV100 GPUs.

accelerate launch train_unconditional.py
--dataset_name="huggan/flowers-102-categories"
--output_dir="ddpm-ema-flowers-64"
--mixed_precision="fp16"
--push_to_hub

The training script creates and saves a checkpoint file in your repository. Now you can load and use your trained model for inference:

from diffusers import DiffusionPipeline import torch

pipeline = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128").to("cuda") image = pipeline().images[0]

< > Update on GitHub