Image2image SDLX finetuning using python (original) (raw)

February 24, 2025, 11:27am 1

Hi guys,

Don’t know which category I should pick but… Recently I faced with an issue. I want to create a script to fine-tune SDXL on my local machine using GPU. I want to train the model on the set of images with masking and without. Basically the task is that it has an image and in a certain region it generates polygons or something I want to place there. I have this paired dataset already.

The result should looks as following. In the inference.py I give an image and the prompt, and model modifies the image.

I started to surf the internet to have an example for this type of generation. But unfortunately all I see is a low-code or no-code solutions. Or solutions which got somewhat outdated due to multiple changes introduced by HuggingFace hub.

I tried multiple ways how to make it but all the time I’m facing with some weird errors, which I’m not always able to address.

If you have a code snippet that could help me out to fine tune model, save it and then inference, please let me know.

Thank you in advance!

Is there anything similar to Inpainting or existing ControlNet?

SDXL is not a model architecture specialized for image processing, so I don’t think there are many options.

Is it possible to finetune Kadinsky for certain image generation?

It seems that fine-tuning is possible, but the sample is old, so it is unknown whether it will work as is. Basically, in this type of model, the training results for Text-to-Image also directly affect Image-to-Image.