[Experiment] Transfer Control to Other SD1.X Models · lllyasviel/ControlNet · Discussion #12 (original) (raw)

News

This post is out-of-date and obsolete. Please directly use Mikubill' A1111 Webui Plugin to control any SD 1.X models. No transfer is needed. Results are a bit better than the ones in this post.

Previous Method (Obsolete)

This is guideline to transfer the ControlNet to any other community model in a relatively “correct” way.

This post is prepared for SD experts. You need to have some understandings about the neural network architecture of Stable Diffusion to perform this experiment.

Let us say we want to use OpenPose to control Anything V3, then the overall method is

AnythingV3_control_openpose = AnythingV3 + SD15_control_openpose – SD15

More specifically,

# Inside Control Net
Any3_control.control_model.weights 
= SD15_control.control_model.weights + Any3.model.diffusion_model.weights - SD15.model.diffusion_model.weights

# Inside Base Model (less important, but better to have)
Any3_control.model.diffusion_model.weights 
= SD15_control.model.diffusion_model.weights + Any3.model.diffusion_model.weights - SD15.model.diffusion_model.weights

You can download necessary files from

AnythingV3: https://huggingface.co/Linaqruf/anything-v3.0
SD1.5: https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main
ControlNet: https://huggingface.co/lllyasviel/ControlNet/tree/main/models

Important things to keep in mind:

  1. Replacing the base model in control net MAY work but is WRONG. This is because control net may be trained with some SD layers unlocked. See the ending part of “SD_locked” in the official training guideline. You need to compute the offset even in the base diffusion model. (Obsolete) Some experiments show that results are equally good without such offsets. Please directly use Mikubill' A1111 Webui Plugin.
  2. The difference of CLIP text encoder must be considered. In many anime models, because of that well-known reason, a dominant majority of models need “clip_skip=2” and a 3x longer Token length. Note that this is also influencing the SoftMax averaging because the length is different.
  3. In some applications like human pose, your input image should not be anime images. It should be real person photos because that image is only read by the OpenPose human pose detector. That image will not be visible for SD/ControlNet. Also, OpenPose is bad at processing anime images.

I have done all these preparations for you.

You may open the "tool_transfer_control.py" and then edit some file paths

path_sd15 = './models/v1-5-pruned.ckpt' path_sd15_with_control = './models/control_sd15_openpose.pth' path_input = './models/anything-v3-full.safetensors' path_output = './models/control_any3_openpose.pth'

You can define the output filename with "path_output". You need to make sure that all other 3 filenames are correct and exist. Then

python tool_transfer_control.py

Then you will get the file

 models/control_any3_openpose.pth

Then, you need to hack the gradio codes to read your new models, and hack the CLIP encoder with "clip_skip=2" and 3x token length.

Taking openpose as an example, you can hack "gradio_pose2image.py" in this way

from share import * from cldm.hack import hack_everything

hack_everything(clip_skip=2)

import config import cv2 import einops import gradio as gr import numpy as np import torch

from pytorch_lightning import seed_everything from annotator.util import resize_image, HWC3 from annotator.openpose import OpenposeDetector from cldm.model import create_model, load_state_dict from ldm.models.diffusion.ddim import DDIMSampler

apply_openpose = OpenposeDetector()

model = create_model('./models/cldm_v15.yaml').cpu() model.load_state_dict(load_state_dict('./models/control_any3_openpose.pth', location='cpu')) model = model.cuda() ddim_sampler = DDIMSampler(model)

def process ...

Then, results will be like:

("1girl")
p

("1girl, masterpiece, garden")
p

And other controls like Canny edge:

("1girl, garden, flowers, sunshine, masterpiece, best quality, ultra-detailed, illustration, disheveled hair")
p