CogView4 (original) (raw)

Make sure to check out the Schedulers guide to learn how to explore the tradeoff between scheduler speed and quality, and see the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines.

This pipeline was contributed by zRzRzRzRzRzRzR. The original codebase can be found here. The original weights can be found under hf.co/THUDM.

class diffusers.CogView4Pipeline

< source >

( tokenizer: AutoTokenizer text_encoder: GlmModel vae: AutoencoderKL transformer: CogView4Transformer2DModel scheduler: FlowMatchEulerDiscreteScheduler )

Parameters

Pipeline for text-to-image generation using CogView4.

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None negative_prompt: typing.Union[str, typing.List[str], NoneType] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 50 timesteps: typing.Optional[typing.List[int]] = None sigmas: typing.Optional[typing.List[float]] = None guidance_scale: float = 5.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.FloatTensor] = None prompt_embeds: typing.Optional[torch.FloatTensor] = None negative_prompt_embeds: typing.Optional[torch.FloatTensor] = None original_size: typing.Optional[typing.Tuple[int, int]] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) output_type: str = 'pil' return_dict: bool = True attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None callback_on_step_end: typing.Union[typing.Callable[[int, int, typing.Dict], NoneType], diffusers.callbacks.PipelineCallback, diffusers.callbacks.MultiPipelineCallbacks, NoneType] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] max_sequence_length: int = 1024 ) → ~pipelines.cogview4.pipeline_CogView4.CogView4PipelineOutput or tuple

Parameters

Returns

~pipelines.cogview4.pipeline_CogView4.CogView4PipelineOutput or tuple

~pipelines.cogview4.pipeline_CogView4.CogView4PipelineOutput if return_dict is True, otherwise atuple. When returning a tuple, the first element is a list with the generated images.

Function invoked when calling the pipeline for generation.

Examples:

import torch from diffusers import CogView4Pipeline

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=torch.bfloat16) pipe.to("cuda")

prompt = "A photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] image.save("output.png")

encode_prompt

< source >

( prompt: typing.Union[str, typing.List[str]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None do_classifier_free_guidance: bool = True num_images_per_prompt: int = 1 prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None device: typing.Optional[torch.device] = None dtype: typing.Optional[torch.dtype] = None max_sequence_length: int = 1024 )

Parameters

Encodes the prompt into text encoder hidden states.

class diffusers.pipelines.cogview4.pipeline_output.CogView4PipelineOutput

< source >

( images: typing.Union[typing.List[PIL.Image.Image], numpy.ndarray] )

Parameters

Output class for CogView3 pipelines.