Chop & Learn: Recognizing and Generating Object-State Compositions (original) (raw)

Compositional Image Generation: Given training images of various objects in different states, generate new images of unseen pairs of objects and states.

We consider these methods:

Stable Diffusion (SD)
Stable Diffusion + Textual Inversion (SD + TI)
DreamBooth
Stable Diffusion + Fine-tuning (FT)
Stable Diffusion + Textual Inversion + Fine-tuning (SD + TI + FT)

Ground Truth (GT) real images are shown in the first row for reference.

Please select different splits, objects, and states to view the generated images of different compositions.