Textual Inversion (original) (raw)

What is Textual Inversion?

Textual inversion: Teach the base model new vocabulary about a particular concept with a couple of images reflecting that concept.

The result of the training is a .pt or a .bin file (former is the format used by original author, latter is by the diffusers library) with the embedding in it. These files can be shared to other generative artists.

Using pre-trained embeddings

Put the embedding into the embeddings directory and use its filename in the prompt. You don't have to restart the program for this to work.

As an example, here is an embedding of Usada Pekora I trained on WD1.2 model, on 53 pictures (119 augmented) for 19500 steps, with 8 vectors per token setting.

Pictures it generates:grid-0037

portrait of usada pekora
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4077357776, Size: 512x512, Model hash: 45dee52b

You can combine multiple embeddings in one prompt:grid-0038

portrait of usada pekora, mignon
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4077357776, Size: 512x512, Model hash: 45dee52b

Be very careful about which model you are using with your embeddings: they work well with the model you used during training, and not so well on different models. For example, here is the above embedding and vanilla 1.4 stable diffusion model:grid-0036

portrait of usada pekora
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4077357776, Size: 512x512, Model hash: 7460a6fa

Training embeddings

Textual inversion tab

Experimental support for training embeddings in user interface.

Explanation for parameters

Creating an embedding

Preprocess

This takes images from a directory, processes them to be ready for textual inversion, and writes results to another directory. This is a convenience feature and you can preprocess pictures yourself if you wish.

Training an embedding

filewords

[filewords] is a tag for prompt template file that allows you to insert text from filename into the prompt. By default, file's extension is removed, as well as all numbers and dashes (-) at the start of filename. So this filename: 000001-1-a man in suit.png will become this text for prompt: a man in suit. Formatting of the text in the filename is left as it is.

It's possible to use options Filename word regex and Filename join string to alter the text from filename: for example, with word regex = \w+ and join string = , , the file from above will produce this text: a, man, in, suit. regex is used to extract words from text (and they are ['a', 'man', 'in', 'suit', ]), and join string (', ') is placed between those words to create one text: a, man, in, suit.

It's also possible to make a text file with same filename as image (000001-1-a man in suit.txt) and just put the prompt text there. The filename and regex options will not be used.

Third party repos

I successfully trained embeddings using those repositories:

Other options are to train on colabs and/or using diffusers library, which I know nothing about.

Finding embeddings online

Hypernetworks

Hypernetworks is a novel (get it?) concept for fine tuning a model without touching any of its weights.

The current way to train hypernets is in the textual inversion tab.

Training works the same way as with textual inversion.

The only requirement is to use a very, very low learning rate, something like 0.000005 or 0.0000005.

Dum Dum Guide

An anonymous user has written a guide with pictures for using hypernetworks: https://rentry.org/hypernetwork4dumdums

Unload VAE and CLIP from VRAM when training

This option on settings tab allows you to save some memoryat the cost of slower preview picture generation.