SANA-Video Image to Video pipeline SanaImageToVideoPipeline support by lawrence-cj · Pull Request #12634 · huggingface/diffusers (original) (raw)

I tested the above code again, and the result is the same. Could you run it again? @dg845

import torch from diffusers import SanaImageToVideoPipeline from diffusers.utils import export_to_video, load_image

pipe = SanaImageToVideoPipeline.from_pretrained( "Efficient-Large-Model/SANA-Video_2B_480p_diffusers", torch_dtype=torch.bfloat16, )

pipe.scheduler = FlowMatchEulerDiscreteScheduler(shift=pipe.scheduler.config.flow_shift)

pipe.vae.to(torch.float32) pipe.text_encoder.to(torch.bfloat16) pipe.to("cuda")

image = load_image("https://raw.githubusercontent.com/NVlabs/Sana/refs/heads/main/asset/samples/i2v-1.png") prompt = "A woman stands against a stunning sunset backdrop, her long, wavy brown hair gently blowing in the breeze. She wears a sleeveless, light-colored blouse with a deep V-neckline, which accentuates her graceful posture. The warm hues of the setting sun cast a golden glow across her face and hair, creating a serene and ethereal atmosphere. The background features a blurred landscape with soft, rolling hills and scattered clouds, adding depth to the scene. The camera remains steady, capturing the tranquil moment from a medium close-up angle." negative_prompt = "A chaotic sequence with misshapen, deformed limbs in heavy motion blur, sudden disappearance, jump cuts, jerky movements, rapid shot changes, frames out of sync, inconsistent character shapes, temporal artifacts, jitter, and ghosting effects, creating a disorienting visual experience." motion_scale = 30 motion_prompt = f" motion score: {motion_scale}." prompt = prompt + motion_prompt

video = pipe( image=image, prompt=prompt, negative_prompt=negative_prompt, height=480, width=832, frames=81, guidance_scale=6, num_inference_steps=50, generator=torch.Generator(device="cuda").manual_seed(0), ).frames[0]

export_to_video(video, "sana-i2v.mp4", fps=16)