ToonCrafter: Generative Cartoon Interpolation (original) (raw)
1The Chinese University of Hong Kong, 2City University of Hong Kong, 3Tencent AI Lab, 4Monash University
Showcases produced by our ToonCrafter
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Comparisons with baseline methods
Input | AnimeInterp | EISAI | FILM |
---|---|---|---|
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Input | AnimeInterp | EISAI | FILM |
![]() |
|||
![]() |
|||
SEINE | ToonCrafter (Ours) | ||
Applications
Cartoon sketch interpolation.
Input frames | Interpolation results | Input frames | Interpolation results |
---|---|---|---|
![]() |
![]() |
||
![]() |
![]() |
||
![]() |
![]() |
||
![]() |
![]() |
||
Reference-based sketch colorization (single-image-reference).
Input | Colorization results | Input | Colorization results |
---|---|---|---|
![]() |
![]() |
||
![]() |
![]() |
||
Reference-based sketch colorization (dual-image-reference).
Input reference | Input sketch | Colorization results | Input reference | Input sketch | Colorization results |
---|---|---|---|---|---|
![]() |
![]() |
||||
![]() |
![]() |
||||
Sparse-sketch-guided generation
Bisection (n=4) (the sketch of two input cartoon frames are always given).
Input frames | Sparse sketch guidance | Interpolation results | Input frames | Sparse sketch guidance | Interpolation results |
---|---|---|---|---|---|
![]() |
![]() |
||||
![]() |
![]() |
||||
Bisection (n=3)
Input frames | Sparse sketch guidance | Interpolation results | Input frames | Sparse sketch guidance | Interpolation results |
---|---|---|---|---|---|
![]() |
![]() |
||||
![]() |
![]() |
||||
Bisection (n=2)
Input frames | Sparse sketch guidance | Interpolation results | Input frames | Sparse sketch guidance | Interpolation results |
---|---|---|---|---|---|
![]() |
![]() |
||||
![]() |
![]() |
||||
Bisection (n=1)
Input frames | Sparse sketch guidance | Interpolation results | Input frames | Sparse sketch guidance | Interpolation results |
---|---|---|---|---|---|
![]() |
![]() |
||||
![]() |
![]() |
||||
Random
Input frames | Sparse sketch guidance | Interpolation results | Input frames | Sparse sketch guidance | Interpolation results |
---|---|---|---|---|---|
![]() |
![]() |
||||
![]() |
![]() |
||||
Ablation study
Toon rectification learning.
Input | I. | II. | III. |
---|---|---|---|
![]() |
|||
![]() |
|||
IV. (Ours) | V. | ||
Dual-reference-based 3D VAE decoder (Reconstruction results, i.e., decoding the latents encoded by encoder).
Input | Ours | Oursw/o P3D | Oursw/o HAR & P3D |
---|---|---|---|
Dual-reference-based 3D VAE decoder (Generation results, i.e., decoding the denoised latents from generator).
Case 1: Please pay attention to the lanterns.
Case 2: Please pay attention to the newspaper.
Input starting frame | Input ending frame | Ours | Oursw/o HAR & P3D |
---|---|---|---|
![]() |
![]() |
||
![]() |
![]() |
||
Sparse sketch guidance.
Input frame | Sparse sketch control (middle-frame) | ZeroGate |
---|---|---|
![]() |
||
![]() |
||
FrameIn.Enc. | w/o sketch | |
Limitations
Our model may not correctly and semantically understand the image contents. (e.g., the black part should be the rigid body of the aircraft, which cannot sway with the wind.)
Input starting frame | Input ending frame | Our failure case |
---|---|---|
![]() |
![]() |
|
Our model may struggle to generate convincing transition motions when objects appear or disappear in the frame.
Input starting frame | Input ending frame | Our failure case |
---|---|---|
![]() |
![]() |
|