Pipelines (original) (raw)

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See thetask summary for examples of use.

There are two categories of pipeline abstractions to be aware about:

The pipeline abstraction

The pipeline abstraction is a wrapper around all the other available pipelines. It is instantiated as any other pipeline but can provide additional quality of life.

Simple call on one item:

pipe = pipeline("text-classification") pipe("This restaurant is awesome") [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

If you want to use a specific model from the hub you can ignore the task if the model on the hub already defines it:

pipe = pipeline(model="FacebookAI/roberta-large-mnli") pipe("This restaurant is awesome") [{'label': 'NEUTRAL', 'score': 0.7313136458396912}]

To call a pipeline on many items, you can call it with a list.

pipe = pipeline("text-classification") pipe(["This restaurant is awesome", "This restaurant is awful"]) [{'label': 'POSITIVE', 'score': 0.9998743534088135}, {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

To iterate over full datasets it is recommended to use a dataset directly. This means you don’t need to allocate the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on GPU. If it doesn’t don’t hesitate to create an issue.

import datasets from transformers import pipeline from transformers.pipelines.pt_utils import KeyDataset from tqdm.auto import tqdm

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) dataset = datasets.load_dataset("superb", name="asr", split="test")

for out in tqdm(pipe(KeyDataset(dataset, "file"))): print(out)

For ease of use, a generator is also possible:

from transformers import pipeline

pipe = pipeline("text-classification")

def data(): while True:

    yield "This is a test"

for out in pipe(data()): print(out)

#### transformers.pipeline

< source >

( task: typing.Optional[str] = None model: typing.Union[str, ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel'), NoneType] = None config: typing.Union[str, transformers.configuration_utils.PretrainedConfig, NoneType] = None tokenizer: typing.Union[str, transformers.tokenization_utils.PreTrainedTokenizer, ForwardRef('PreTrainedTokenizerFast'), NoneType] = None feature_extractor: typing.Union[str, ForwardRef('SequenceFeatureExtractor'), NoneType] = None image_processor: typing.Union[str, transformers.image_processing_utils.BaseImageProcessor, NoneType] = None processor: typing.Union[str, transformers.processing_utils.ProcessorMixin, NoneType] = None framework: typing.Optional[str] = None revision: typing.Optional[str] = None use_fast: bool = True token: typing.Union[str, bool, NoneType] = None device: typing.Union[int, str, ForwardRef('torch.device'), NoneType] = None device_map = None torch_dtype = None trust_remote_code: typing.Optional[bool] = None model_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None pipeline_class: typing.Optional[typing.Any] = None **kwargs ) → Pipeline

Parameters

A suitable pipeline for the task.

Utility factory method to build a Pipeline.

A pipeline consists of:

While there are such optional arguments as `tokenizer`, `feature_extractor`, `image_processor`, and `processor`, they shouldn't be specified all at once. If these components are not provided, `pipeline` will try to load required ones automatically. In case you want to provide these components explicitly, please refer to a specific pipeline in order to get more details regarding what components are required.

Examples:

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

analyzer = pipeline("sentiment-analysis")

oracle = pipeline( ... "question-answering", model="distilbert/distilbert-base-cased-distilled-squad", tokenizer="google-bert/bert-base-cased" ... )

model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english") tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased") recognizer = pipeline("ner", model=model, tokenizer=tokenizer)

Pipeline batching

All pipelines can use batching. This will work whenever the pipeline uses its streaming ability (so when passing lists or Dataset or generator).

from transformers import pipeline from transformers.pipelines.pt_utils import KeyDataset import datasets

dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised") pipe = pipeline("text-classification", device=0) for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"): print(out)

However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending on hardware, data and the actual model being used.

Example where it’s mostly a speedup:

from transformers import pipeline from torch.utils.data import Dataset from tqdm.auto import tqdm

pipe = pipeline("text-classification", device=0)

class MyDataset(Dataset): def len(self): return 5000

def __getitem__(self, i):
    return "This is a test"

dataset = MyDataset()

for batch_size in [1, 8, 64, 256]: print("-" * 30) print(f"Streaming batch_size={batch_size}") for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): pass

On GTX 970


Streaming no batching 100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s]

Streaming batch_size=8 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s]

Streaming batch_size=64 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s]

Streaming batch_size=256 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s] (diminishing returns, saturated the GPU)

Example where it’s most a slowdown:

class MyDataset(Dataset): def len(self): return 5000

def __getitem__(self, i):
    if i % 64 == 0:
        n = 100
    else:
        n = 1
    return "This is a test" * n

This is a occasional very long sentence compared to the other. In that case, the whole batch will need to be 400 tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on bigger batches, the program simply crashes.

Streaming no batching 100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s]

Streaming batch_size=8 100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s]

Streaming batch_size=64 100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s]

Streaming batch_size=256 0%| | 0/1000 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/nicolas/src/transformers/test.py", line 42, in for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)): .... q = q / math.sqrt(dim_per_head)
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)

There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of thumb:

For users, a rule of thumb is:

Pipeline chunk batching

zero-shot-classification and question-answering are slightly specific in the sense, that a single input might yield multiple forward pass of a model. Under normal circumstances, this would yield issues with batch_size argument.

In order to circumvent this issue, both of these pipelines are a bit specific, they are ChunkPipeline instead of regular Pipeline. In short:

preprocessed = pipe.preprocess(inputs) model_outputs = pipe.forward(preprocessed) outputs = pipe.postprocess(model_outputs)

Now becomes:

all_model_outputs = [] for preprocessed in pipe.preprocess(inputs): model_outputs = pipe.forward(preprocessed) all_model_outputs.append(model_outputs) outputs = pipe.postprocess(all_model_outputs)

This should be very transparent to your code because the pipelines are used in the same way.

This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don’t have to care about how many forward passes you inputs are actually going to trigger, you can optimize the batch_sizeindependently of the inputs. The caveats from the previous section still apply.

Pipeline FP16 inference

Models can be run in FP16 which can be significantly faster on GPU while saving memory. Most models will not suffer noticeable performance loss from this. The larger the model, the less likely that it will.

To enable FP16 inference, you can simply pass torch_dtype=torch.float16 or torch_dtype='float16' to the pipeline constructor. Note that this only works for models with a PyTorch backend. Your inputs will be converted to FP16 internally.

Pipeline custom code

If you want to override a specific pipeline.

Don’t hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most cases, so transformers could maybe support your use case.

If you want to try simply you can:

class MyPipeline(TextClassificationPipeline): def postprocess():

    scores = scores * 100

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)

my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)

That should enable you to do all the custom code you want.

Implementing a pipeline

Implementing a new pipeline

Audio

Pipelines available for audio tasks include the following.

AudioClassificationPipeline

class transformers.AudioClassificationPipeline

< source >

( *args **kwargs )

Parameters

Audio classification pipeline using any AutoModelForAudioClassification. This pipeline predicts the class of a raw waveform or an audio file. In case of an audio file, ffmpeg should be installed to support multiple audio formats.

Example:

from transformers import pipeline

classifier = pipeline(model="superb/wav2vec2-base-superb-ks") classifier("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac") [{'score': 0.997, 'label': 'unknown'}, {'score': 0.002, 'label': 'left'}, {'score': 0.0, 'label': 'yes'}, {'score': 0.0, 'label': 'down'}, {'score': 0.0, 'label': 'stop'}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This pipeline can currently be loaded from pipeline() using the following task identifier:"audio-classification".

See the list of available models onhuggingface.co/models.

__call__

< source >

( inputs: typing.Union[numpy.ndarray, bytes, str] **kwargs ) → A list of dict with the following keys

Parameters

Returns

A list of dict with the following keys

Classify the sequence(s) given as inputs. See the AutomaticSpeechRecognitionPipeline documentation for more information.

AutomaticSpeechRecognitionPipeline

class transformers.AutomaticSpeechRecognitionPipeline

< source >

( model: PreTrainedModel feature_extractor: typing.Union[ForwardRef('SequenceFeatureExtractor'), str] = None tokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None decoder: typing.Union[ForwardRef('BeamSearchDecoderCTC'), str, NoneType] = None device: typing.Union[int, ForwardRef('torch.device')] = None torch_dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None **kwargs )

Parameters

Pipeline that aims at extracting spoken text contained within some audio.

The input can be either a raw waveform or a audio file. In case of the audio file, ffmpeg should be installed for to support multiple audio formats

Example:

from transformers import pipeline

transcriber = pipeline(model="openai/whisper-base") transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac") {'text': ' He hoped there would be stew for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered flour-fatten sauce.'}

Learn more about the basics of using a pipeline in the pipeline tutorial

__call__

< source >

( inputs: typing.Union[numpy.ndarray, bytes, str] **kwargs ) → Dict

Parameters

A dictionary with the following keys:

Transcribe the audio sequence(s) given as inputs to text. See the AutomaticSpeechRecognitionPipelinedocumentation for more information.

TextToAudioPipeline

class transformers.TextToAudioPipeline

< source >

( *args vocoder = None sampling_rate = None **kwargs )

Text-to-audio generation pipeline using any AutoModelForTextToWaveform or AutoModelForTextToSpectrogram. This pipeline generates an audio file from an input text and optional other conditional inputs.

Example:

from transformers import pipeline

pipe = pipeline(model="suno/bark-small") output = pipe("Hey it's HuggingFace on the phone!")

audio = output["audio"] sampling_rate = output["sampling_rate"]

Learn more about the basics of using a pipeline in the pipeline tutorial

You can specify parameters passed to the model by using TextToAudioPipeline.__call__.forward_params orTextToAudioPipeline.__call__.generate_kwargs.

Example:

from transformers import pipeline

music_generator = pipeline(task="text-to-audio", model="facebook/musicgen-small", framework="pt")

generate_kwargs = { ... "do_sample": True, ... "temperature": 0.7, ... "max_new_tokens": 35, ... }

outputs = music_generator("Techno music with high melodic riffs", generate_kwargs=generate_kwargs)

This pipeline can currently be loaded from pipeline() using the following task identifiers: "text-to-speech" or"text-to-audio".

See the list of available models on huggingface.co/models.

__call__

< source >

( text_inputs: typing.Union[str, typing.List[str]] **forward_params ) → A dict or a list of dict

Parameters

Returns

A dict or a list of dict

The dictionaries have two keys:

Generates speech/audio from the inputs. See the TextToAudioPipeline documentation for more information.

ZeroShotAudioClassificationPipeline

class transformers.ZeroShotAudioClassificationPipeline

< source >

( **kwargs )

Parameters

Zero shot audio classification pipeline using ClapModel. This pipeline predicts the class of an audio when you provide an audio and a set of candidate_labels.

The default hypothesis_template is : "This is a sound of {}.". Make sure you update it for your usage.

Example:

from transformers import pipeline from datasets import load_dataset

dataset = load_dataset("ashraq/esc50") audio = next(iter(dataset["train"]["audio"]))["array"] classifier = pipeline(task="zero-shot-audio-classification", model="laion/clap-htsat-unfused") classifier(audio, candidate_labels=["Sound of a dog", "Sound of vaccum cleaner"]) [{'score': 0.9996, 'label': 'Sound of a dog'}, {'score': 0.0004, 'label': 'Sound of vaccum cleaner'}]

Learn more about the basics of using a pipeline in the pipeline tutorial This audio classification pipeline can currently be loaded from pipeline() using the following task identifier:"zero-shot-audio-classification". See the list of available models onhuggingface.co/models.

__call__

< source >

( audios: typing.Union[numpy.ndarray, bytes, str] **kwargs )

Parameters

Assign labels to the audio(s) passed as inputs.

Computer vision

Pipelines available for computer vision tasks include the following.

DepthEstimationPipeline

class transformers.DepthEstimationPipeline

< source >

( *args **kwargs )

Parameters

Depth estimation pipeline using any AutoModelForDepthEstimation. This pipeline predicts the depth of an image.

Example:

from transformers import pipeline

depth_estimator = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf") output = depth_estimator("http://images.cocodataset.org/val2017/000000039769.jpg")

output["predicted_depth"].shape torch.Size([1, 384, 384])

Learn more about the basics of using a pipeline in the pipeline tutorial

This depth estimation pipeline can currently be loaded from pipeline() using the following task identifier:"depth-estimation".

See the list of available models on huggingface.co/models.

__call__

< source >

( inputs: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]] = None **kwargs )

Parameters

Predict the depth(s) of the image(s) passed as inputs.

ImageClassificationPipeline

class transformers.ImageClassificationPipeline

< source >

( *args **kwargs )

Parameters

Image classification pipeline using any AutoModelForImageClassification. This pipeline predicts the class of an image.

Example:

from transformers import pipeline

classifier = pipeline(model="microsoft/beit-base-patch16-224-pt22k-ft22k") classifier("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png") [{'score': 0.442, 'label': 'macaw'}, {'score': 0.088, 'label': 'popinjay'}, {'score': 0.075, 'label': 'parrot'}, {'score': 0.073, 'label': 'parodist, lampooner'}, {'score': 0.046, 'label': 'poll, poll_parrot'}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This image classification pipeline can currently be loaded from pipeline() using the following task identifier:"image-classification".

See the list of available models onhuggingface.co/models.

__call__

< source >

( inputs: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]] = None **kwargs )

Parameters

Assign labels to the image(s) passed as inputs.

ImageSegmentationPipeline

class transformers.ImageSegmentationPipeline

< source >

( *args **kwargs )

Parameters

Image segmentation pipeline using any AutoModelForXXXSegmentation. This pipeline predicts masks of objects and their classes.

Example:

from transformers import pipeline

segmenter = pipeline(model="facebook/detr-resnet-50-panoptic") segments = segmenter("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png") len(segments) 2

segments[0]["label"] 'bird'

segments[1]["label"] 'bird'

type(segments[0]["mask"])
<class 'PIL.Image.Image'>

segments[0]["mask"].size (768, 512)

This image segmentation pipeline can currently be loaded from pipeline() using the following task identifier:"image-segmentation".

See the list of available models onhuggingface.co/models.

__call__

< source >

( inputs = None **kwargs )

Parameters

Perform segmentation (detect masks & classes) in the image(s) passed as inputs.

ImageToImagePipeline

class transformers.ImageToImagePipeline

< source >

( *args **kwargs )

Parameters

Image to Image pipeline using any AutoModelForImageToImage. This pipeline generates an image based on a previous image input.

Example:

from PIL import Image import requests

from transformers import pipeline

upscaler = pipeline("image-to-image", model="caidas/swin2SR-classical-sr-x2-64") img = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw) img = img.resize((64, 64)) upscaled_img = upscaler(img) img.size (64, 64)

upscaled_img.size (144, 144)

This image to image pipeline can currently be loaded from pipeline() using the following task identifier:"image-to-image".

See the list of available models on huggingface.co/models.

__call__

< source >

( images: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]] **kwargs )

Parameters

Transform the image(s) passed as inputs.

ObjectDetectionPipeline

class transformers.ObjectDetectionPipeline

< source >

( *args **kwargs )

Parameters

Object detection pipeline using any AutoModelForObjectDetection. This pipeline predicts bounding boxes of objects and their classes.

Example:

from transformers import pipeline

detector = pipeline(model="facebook/detr-resnet-50") detector("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png") [{'score': 0.997, 'label': 'bird', 'box': {'xmin': 69, 'ymin': 171, 'xmax': 396, 'ymax': 507}}, {'score': 0.999, 'label': 'bird', 'box': {'xmin': 398, 'ymin': 105, 'xmax': 767, 'ymax': 507}}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This object detection pipeline can currently be loaded from pipeline() using the following task identifier:"object-detection".

See the list of available models on huggingface.co/models.

__call__

< source >

( *args **kwargs )

Parameters

Detect objects (bounding boxes & classes) in the image(s) passed as inputs.

VideoClassificationPipeline

class transformers.VideoClassificationPipeline

< source >

( *args **kwargs )

Parameters

Video classification pipeline using any AutoModelForVideoClassification. This pipeline predicts the class of a video.

This video classification pipeline can currently be loaded from pipeline() using the following task identifier:"video-classification".

See the list of available models onhuggingface.co/models.

__call__

< source >

( inputs: typing.Union[str, typing.List[str], NoneType] = None **kwargs )

Parameters

Assign labels to the video(s) passed as inputs.

ZeroShotImageClassificationPipeline

class transformers.ZeroShotImageClassificationPipeline

< source >

( **kwargs )

Parameters

Zero shot image classification pipeline using CLIPModel. This pipeline predicts the class of an image when you provide an image and a set of candidate_labels.

Example:

from transformers import pipeline

classifier = pipeline(model="google/siglip-so400m-patch14-384") classifier( ... "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", ... candidate_labels=["animals", "humans", "landscape"], ... ) [{'score': 0.965, 'label': 'animals'}, {'score': 0.03, 'label': 'humans'}, {'score': 0.005, 'label': 'landscape'}]

classifier( ... "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", ... candidate_labels=["black and white", "photorealist", "painting"], ... ) [{'score': 0.996, 'label': 'black and white'}, {'score': 0.003, 'label': 'photorealist'}, {'score': 0.0, 'label': 'painting'}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This image classification pipeline can currently be loaded from pipeline() using the following task identifier:"zero-shot-image-classification".

See the list of available models onhuggingface.co/models.

__call__

< source >

( image: typing.Union[str, typing.List[str], ForwardRef('Image'), typing.List[ForwardRef('Image')]] = None **kwargs )

Parameters

Assign labels to the image(s) passed as inputs.

ZeroShotObjectDetectionPipeline

class transformers.ZeroShotObjectDetectionPipeline

< source >

( **kwargs )

Parameters

Zero shot object detection pipeline using OwlViTForObjectDetection. This pipeline predicts bounding boxes of objects when you provide an image and a set of candidate_labels.

Example:

from transformers import pipeline

detector = pipeline(model="google/owlvit-base-patch32", task="zero-shot-object-detection") detector( ... "http://images.cocodataset.org/val2017/000000039769.jpg", ... candidate_labels=["cat", "couch"], ... ) [{'score': 0.287, 'label': 'cat', 'box': {'xmin': 324, 'ymin': 20, 'xmax': 640, 'ymax': 373}}, {'score': 0.254, 'label': 'cat', 'box': {'xmin': 1, 'ymin': 55, 'xmax': 315, 'ymax': 472}}, {'score': 0.121, 'label': 'couch', 'box': {'xmin': 4, 'ymin': 0, 'xmax': 642, 'ymax': 476}}]

detector( ... "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", ... candidate_labels=["head", "bird"], ... ) [{'score': 0.119, 'label': 'bird', 'box': {'xmin': 71, 'ymin': 170, 'xmax': 410, 'ymax': 508}}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This object detection pipeline can currently be loaded from pipeline() using the following task identifier:"zero-shot-object-detection".

See the list of available models onhuggingface.co/models.

__call__

< source >

( image: typing.Union[str, ForwardRef('Image.Image'), typing.List[typing.Dict[str, typing.Any]]] candidate_labels: typing.Union[str, typing.List[str], NoneType] = None **kwargs )

Parameters

Detect objects (bounding boxes & classes) in the image(s) passed as inputs.

Natural Language Processing

Pipelines available for natural language processing tasks include the following.

FillMaskPipeline

class transformers.FillMaskPipeline

< source >

( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')] tokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None feature_extractor: typing.Optional[ForwardRef('SequenceFeatureExtractor')] = None image_processor: typing.Optional[transformers.image_processing_utils.BaseImageProcessor] = None processor: typing.Optional[transformers.processing_utils.ProcessorMixin] = None modelcard: typing.Optional[transformers.modelcard.ModelCard] = None framework: typing.Optional[str] = None task: str = '' args_parser: ArgumentHandler = None device: typing.Union[int, ForwardRef('torch.device')] = None torch_dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None binary_output: bool = False **kwargs )

Parameters

Masked language modeling prediction pipeline using any ModelWithLMHead. See the masked language modeling examples for more information.

Example:

from transformers import pipeline

fill_masker = pipeline(model="google-bert/bert-base-uncased") fill_masker("This is a simple [MASK].") [{'score': 0.042, 'token': 3291, 'token_str': 'problem', 'sequence': 'this is a simple problem.'}, {'score': 0.031, 'token': 3160, 'token_str': 'question', 'sequence': 'this is a simple question.'}, {'score': 0.03, 'token': 8522, 'token_str': 'equation', 'sequence': 'this is a simple equation.'}, {'score': 0.027, 'token': 2028, 'token_str': 'one', 'sequence': 'this is a simple one.'}, {'score': 0.024, 'token': 3627, 'token_str': 'rule', 'sequence': 'this is a simple rule.'}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This mask filling pipeline can currently be loaded from pipeline() using the following task identifier:"fill-mask".

The models that this pipeline can use are models that have been trained with a masked language modeling objective, which includes the bi-directional models in the library. See the up-to-date list of available models onhuggingface.co/models.

This pipeline only works for inputs with exactly one token masked. Experimental: We added support for multiple masks. The returned values are raw model output, and correspond to disjoint probabilities where one might expect joint probabilities (See discussion).

This pipeline now supports tokenizer_kwargs. For example try:

from transformers import pipeline

fill_masker = pipeline(model="google-bert/bert-base-uncased") tokenizer_kwargs = {"truncation": True} fill_masker( ... "This is a simple [MASK]. " + "...with a large amount of repeated text appended. " * 100, ... tokenizer_kwargs=tokenizer_kwargs, ... )

__call__

< source >

( inputs **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as list of dictionaries with the following keys:

Fill the masked token in the text(s) given as inputs.

QuestionAnsweringPipeline

class transformers.QuestionAnsweringPipeline

< source >

( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')] tokenizer: PreTrainedTokenizer modelcard: typing.Optional[transformers.modelcard.ModelCard] = None framework: typing.Optional[str] = None task: str = '' **kwargs )

Parameters

Question Answering pipeline using any ModelForQuestionAnswering. See the question answering examples for more information.

Example:

from transformers import pipeline

oracle = pipeline(model="deepset/roberta-base-squad2") oracle(question="Where do I live?", context="My name is Wolfgang and I live in Berlin") {'score': 0.9191, 'start': 34, 'end': 40, 'answer': 'Berlin'}

Learn more about the basics of using a pipeline in the pipeline tutorial

This question answering pipeline can currently be loaded from pipeline() using the following task identifier:"question-answering".

The models that this pipeline can use are models that have been fine-tuned on a question answering task. See the up-to-date list of available models onhuggingface.co/models.

__call__

< source >

( *args **kwargs ) → A dict or a list of dict

Parameters

Returns

A dict or a list of dict

Each result comes as a dictionary with the following keys:

Answer the question(s) given as inputs by using the context(s).

create_sample

< source >

( question: typing.Union[str, typing.List[str]] context: typing.Union[str, typing.List[str]] ) → One or a list of SquadExample

Parameters

Returns

One or a list of SquadExample

The corresponding SquadExample grouping question and context.

QuestionAnsweringPipeline leverages the SquadExample internally. This helper method encapsulate all the logic for converting question(s) and context(s) to SquadExample.

We currently support extractive question answering.

span_to_answer

< source >

( text: str start: int end: int ) → Dictionary like `{‘answer’

Parameters

Returns

Dictionary like `{‘answer’

str, ‘start’: int, ‘end’: int}`

When decoding from token probabilities, this method maps token indexes to actual word in the initial context.

SummarizationPipeline

class transformers.SummarizationPipeline

< source >

( *args **kwargs )

Parameters

Summarize news articles and other documents.

This summarizing pipeline can currently be loaded from pipeline() using the following task identifier:"summarization".

The models that this pipeline can use are models that have been fine-tuned on a summarization task, which is currently, ’_bart-large-cnn_’, ’_google-t5/t5-small_’, ’_google-t5/t5-base_’, ’_google-t5/t5-large_’, ’_google-t5/t5-3b_’, ’_google-t5/t5-11b_’. See the up-to-date list of available models on huggingface.co/models. For a list of available parameters, see the following documentation

Usage:

summarizer = pipeline("summarization") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

summarizer = pipeline("summarization", model="google-t5/t5-base", tokenizer="google-t5/t5-base", framework="tf") summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

__call__

< source >

( *args **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as a dictionary with the following keys:

Summarize the text(s) given as inputs.

TableQuestionAnsweringPipeline

class transformers.TableQuestionAnsweringPipeline

< source >

( args_parser = <transformers.pipelines.table_question_answering.TableQuestionAnsweringArgumentHandler object at 0x7f07667fa050> *args **kwargs )

Parameters

Table Question Answering pipeline using a ModelForTableQuestionAnswering. This pipeline is only available in PyTorch.

Example:

from transformers import pipeline

oracle = pipeline(model="google/tapas-base-finetuned-wtq") table = { ... "Repository": ["Transformers", "Datasets", "Tokenizers"], ... "Stars": ["36542", "4512", "3934"], ... "Contributors": ["651", "77", "34"], ... "Programming language": ["Python", "Python", "Rust, Python and NodeJS"], ... } oracle(query="How many stars does the transformers repository have?", table=table) {'answer': 'AVERAGE > 36542', 'coordinates': [(0, 1)], 'cells': ['36542'], 'aggregator': 'AVERAGE'}

Learn more about the basics of using a pipeline in the pipeline tutorial

This tabular question answering pipeline can currently be loaded from pipeline() using the following task identifier: "table-question-answering".

The models that this pipeline can use are models that have been fine-tuned on a tabular question answering task. See the up-to-date list of available models onhuggingface.co/models.

__call__

< source >

( *args **kwargs ) → A dictionary or a list of dictionaries containing results

Parameters

Returns

A dictionary or a list of dictionaries containing results

Each result is a dictionary with the following keys:

Answers queries according to a table. The pipeline accepts several types of inputs which are detailed below:

The table argument should be a dict or a DataFrame built from that dict, containing the whole table:

Example:

data = { "actors": ["brad pitt", "leonardo di caprio", "george clooney"], "age": ["56", "45", "59"], "number of movies": ["87", "53", "69"], "date of birth": ["7 february 1967", "10 june 1996", "28 november 1967"], }

This dictionary can be passed in as such, or can be converted to a pandas DataFrame:

Example:

import pandas as pd

table = pd.DataFrame.from_dict(data)

TextClassificationPipeline

class transformers.TextClassificationPipeline

< source >

( **kwargs )

Parameters

Text classification pipeline using any ModelForSequenceClassification. See the sequence classification examples for more information.

Example:

from transformers import pipeline

classifier = pipeline(model="distilbert/distilbert-base-uncased-finetuned-sst-2-english") classifier("This movie is disgustingly good !") [{'label': 'POSITIVE', 'score': 1.0}]

classifier("Director tried too much.") [{'label': 'NEGATIVE', 'score': 0.996}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This text classification pipeline can currently be loaded from pipeline() using the following task identifier:"sentiment-analysis" (for classifying sequences according to positive or negative sentiments).

If multiple classification labels are available (model.config.num_labels >= 2), the pipeline will run a softmax over the results. If there is a single label, the pipeline will run a sigmoid over the result. In case of regression tasks (model.config.problem_type == "regression"), will not apply any function on the output.

The models that this pipeline can use are models that have been fine-tuned on a sequence classification task. See the up-to-date list of available models onhuggingface.co/models.

__call__

< source >

( inputs **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as list of dictionaries with the following keys:

If top_k is used, one such dictionary is returned per label.

Classify the text(s) given as inputs.

TextGenerationPipeline

class transformers.TextGenerationPipeline

< source >

( *args **kwargs )

Parameters

Language generation pipeline using any ModelWithLMHead. This pipeline predicts the words that will follow a specified text prompt. When the underlying model is a conversational model, it can also accept one or more chats, in which case the pipeline will operate in chat mode and will continue the chat(s) by adding its response(s). Each chat takes the form of a list of dicts, where each dict contains “role” and “content” keys.

Examples:

from transformers import pipeline

generator = pipeline(model="openai-community/gpt2") generator("I can't believe you did such a ", do_sample=False) [{'generated_text': "I can't believe you did such a icky thing to me. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I'm so sorry. I"}]

outputs = generator("My tart needs some", num_return_sequences=4, return_full_text=False)

from transformers import pipeline

generator = pipeline(model="HuggingFaceH4/zephyr-7b-beta")

generator([{"role": "user", "content": "What is the capital of France? Answer in one word."}], do_sample=False, max_new_tokens=2) [{'generated_text': [{'role': 'user', 'content': 'What is the capital of France? Answer in one word.'}, {'role': 'assistant', 'content': 'Paris'}]}]

Learn more about the basics of using a pipeline in the pipeline tutorial. You can pass text generation parameters to this pipeline to control stopping criteria, decoding strategy, and more. Learn more about text generation parameters in Text generation strategies and Text generation.

This language generation pipeline can currently be loaded from pipeline() using the following task identifier:"text-generation".

The models that this pipeline can use are models that have been trained with an autoregressive language modeling objective. See the list of available text completion modelsand the list of conversational modelson [huggingface.co/models].

__call__

< source >

( text_inputs **kwargs ) → A list or a list of lists of dict

Parameters

Returns

A list or a list of lists of dict

Returns one of the following dictionaries (cannot return a combination of both generated_text and generated_token_ids):

Complete the prompt(s) given as inputs.

Text2TextGenerationPipeline

class transformers.Text2TextGenerationPipeline

< source >

( *args **kwargs )

Parameters

Pipeline for text to text generation using seq2seq models.

Example:

from transformers import pipeline

generator = pipeline(model="mrm8488/t5-base-finetuned-question-generation-ap") generator( ... "answer: Manuel context: Manuel has created RuPERTa-base with the support of HF-Transformers and Google" ... ) [{'generated_text': 'question: Who created the RuPERTa-base?'}]

Learn more about the basics of using a pipeline in the pipeline tutorial. You can pass text generation parameters to this pipeline to control stopping criteria, decoding strategy, and more. Learn more about text generation parameters in Text generation strategies and Text generation.

This Text2TextGenerationPipeline pipeline can currently be loaded from pipeline() using the following task identifier: "text2text-generation".

The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models onhuggingface.co/models. For a list of available parameters, see the following documentation

Usage:

text2text_generator = pipeline("text2text-generation") text2text_generator("question: What is 42 ? context: 42 is the answer to life, the universe and everything")

__call__

< source >

( *args **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as a dictionary with the following keys:

Generate the output text(s) using text(s) given as inputs.

check_inputs

< source >

( input_length: int min_length: int max_length: int )

Checks whether there might be something wrong with given input with regard to the model.

TokenClassificationPipeline

class transformers.TokenClassificationPipeline

< source >

( args_parser = <transformers.pipelines.token_classification.TokenClassificationArgumentHandler object at 0x7f07667fba90> *args **kwargs )

Parameters

Named Entity Recognition pipeline using any ModelForTokenClassification. See the named entity recognition examples for more information.

Example:

from transformers import pipeline

token_classifier = pipeline(model="Jean-Baptiste/camembert-ner", aggregation_strategy="simple") sentence = "Je m'appelle jean-baptiste et je vis à montréal" tokens = token_classifier(sentence) tokens [{'entity_group': 'PER', 'score': 0.9931, 'word': 'jean-baptiste', 'start': 12, 'end': 26}, {'entity_group': 'LOC', 'score': 0.998, 'word': 'montréal', 'start': 38, 'end': 47}]

token = tokens[0]

sentence[token["start"] : token["end"]] ' jean-baptiste'

syntaxer = pipeline(model="vblagoje/bert-english-uncased-finetuned-pos", aggregation_strategy="simple") syntaxer("My name is Sarah and I live in London") [{'entity_group': 'PRON', 'score': 0.999, 'word': 'my', 'start': 0, 'end': 2}, {'entity_group': 'NOUN', 'score': 0.997, 'word': 'name', 'start': 3, 'end': 7}, {'entity_group': 'AUX', 'score': 0.994, 'word': 'is', 'start': 8, 'end': 10}, {'entity_group': 'PROPN', 'score': 0.999, 'word': 'sarah', 'start': 11, 'end': 16}, {'entity_group': 'CCONJ', 'score': 0.999, 'word': 'and', 'start': 17, 'end': 20}, {'entity_group': 'PRON', 'score': 0.999, 'word': 'i', 'start': 21, 'end': 22}, {'entity_group': 'VERB', 'score': 0.998, 'word': 'live', 'start': 23, 'end': 27}, {'entity_group': 'ADP', 'score': 0.999, 'word': 'in', 'start': 28, 'end': 30}, {'entity_group': 'PROPN', 'score': 0.999, 'word': 'london', 'start': 31, 'end': 37}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This token recognition pipeline can currently be loaded from pipeline() using the following task identifier:"ner" (for predicting the classes of tokens in a sequence: person, organisation, location or miscellaneous).

The models that this pipeline can use are models that have been fine-tuned on a token classification task. See the up-to-date list of available models onhuggingface.co/models.

__call__

< source >

( inputs: typing.Union[str, typing.List[str]] **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as a list of dictionaries (one for each token in the corresponding input, or each entity if this pipeline was instantiated with an aggregation_strategy) with the following keys:

Classify each token of the text(s) given as inputs.

aggregate_words

< source >

( entities: typing.List[dict] aggregation_strategy: AggregationStrategy )

Override tokens from a given word that disagree to force agreement on word boundaries.

Example: micro|soft| com|pany| B-ENT I-NAME I-ENT I-ENT will be rewritten with first strategy as microsoft| company| B-ENT I-ENT

gather_pre_entities

< source >

( sentence: str input_ids: ndarray scores: ndarray offset_mapping: typing.Optional[typing.List[typing.Tuple[int, int]]] special_tokens_mask: ndarray aggregation_strategy: AggregationStrategy )

Fuse various numpy arrays into dicts with all the information needed for aggregation

group_entities

< source >

( entities: typing.List[dict] )

Parameters

Find and group together the adjacent tokens with the same entity predicted.

group_sub_entities

< source >

( entities: typing.List[dict] )

Parameters

Group together the adjacent tokens with the same entity predicted.

TranslationPipeline

class transformers.TranslationPipeline

< source >

( *args **kwargs )

Parameters

Translates from one language to another.

This translation pipeline can currently be loaded from pipeline() using the following task identifier:"translation_xx_to_yy".

The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models on huggingface.co/models. For a list of available parameters, see the following documentation

Usage:

en_fr_translator = pipeline("translation_en_to_fr") en_fr_translator("How old are you?")

__call__

< source >

( *args **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as a dictionary with the following keys:

Translate the text(s) given as inputs.

ZeroShotClassificationPipeline

class transformers.ZeroShotClassificationPipeline

< source >

( args_parser = <transformers.pipelines.zero_shot_classification.ZeroShotClassificationArgumentHandler object at 0x7f07667b10f0> *args **kwargs )

Parameters

NLI-based zero-shot classification pipeline using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of text-classification pipelines, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it ismuch more flexible.

Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model. Then, the logit for entailment is taken as the logit for the candidate label being valid. Any NLI model can be used, but the id of the entailment label must be included in the model config’s :attr:~transformers.PretrainedConfig.label2id.

Example:

from transformers import pipeline

oracle = pipeline(model="facebook/bart-large-mnli") oracle( ... "I have a problem with my iphone that needs to be resolved asap!!", ... candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"], ... ) {'sequence': 'I have a problem with my iphone that needs to be resolved asap!!', 'labels': ['urgent', 'phone', 'computer', 'not urgent', 'tablet'], 'scores': [0.504, 0.479, 0.013, 0.003, 0.002]}

oracle( ... "I have a problem with my iphone that needs to be resolved asap!!", ... candidate_labels=["english", "german"], ... ) {'sequence': 'I have a problem with my iphone that needs to be resolved asap!!', 'labels': ['english', 'german'], 'scores': [0.814, 0.186]}

Learn more about the basics of using a pipeline in the pipeline tutorial

This NLI pipeline can currently be loaded from pipeline() using the following task identifier:"zero-shot-classification".

The models that this pipeline can use are models that have been fine-tuned on an NLI task. See the up-to-date list of available models on huggingface.co/models.

__call__

< source >

( sequences: typing.Union[str, typing.List[str]] *args **kwargs ) → A dict or a list of dict

Parameters

Returns

A dict or a list of dict

Each result comes as a dictionary with the following keys:

Classify the sequence(s) given as inputs. See the ZeroShotClassificationPipeline documentation for more information.

Multimodal

Pipelines available for multimodal tasks include the following.

DocumentQuestionAnsweringPipeline

class transformers.DocumentQuestionAnsweringPipeline

< source >

( *args **kwargs )

Parameters

Document Question Answering pipeline using any AutoModelForDocumentQuestionAnswering. The inputs/outputs are similar to the (extractive) question answering pipeline; however, the pipeline takes an image (and optional OCR’d words/boxes) as input instead of text context.

Example:

from transformers import pipeline

document_qa = pipeline(model="impira/layoutlm-document-qa") document_qa( ... image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", ... question="What is the invoice number?", ... ) [{'score': 0.425, 'answer': 'us-001', 'start': 16, 'end': 16}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This document question answering pipeline can currently be loaded from pipeline() using the following task identifier: "document-question-answering".

The models that this pipeline can use are models that have been fine-tuned on a document question answering task. See the up-to-date list of available models onhuggingface.co/models.

__call__

< source >

( image: typing.Union[ForwardRef('Image.Image'), str] question: typing.Optional[str] = None word_boxes: typing.Optional[typing.Tuple[str, typing.List[float]]] = None **kwargs ) → A dict or a list of dict

Parameters

Returns

A dict or a list of dict

Each result comes as a dictionary with the following keys:

Answer the question(s) given as inputs by using the document(s). A document is defined as an image and an optional list of (word, box) tuples which represent the text in the document. If the word_boxes are not provided, it will use the Tesseract OCR engine (if available) to extract the words and boxes automatically for LayoutLM-like models which require them as input. For Donut, no OCR is run.

You can invoke the pipeline several ways:

FeatureExtractionPipeline

( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')] tokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None feature_extractor: typing.Optional[ForwardRef('SequenceFeatureExtractor')] = None image_processor: typing.Optional[transformers.image_processing_utils.BaseImageProcessor] = None processor: typing.Optional[transformers.processing_utils.ProcessorMixin] = None modelcard: typing.Optional[transformers.modelcard.ModelCard] = None framework: typing.Optional[str] = None task: str = '' args_parser: ArgumentHandler = None device: typing.Union[int, ForwardRef('torch.device')] = None torch_dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None binary_output: bool = False **kwargs )

Parameters

Feature extraction pipeline uses no model head. This pipeline extracts the hidden states from the base transformer, which can be used as features in downstream tasks.

Example:

from transformers import pipeline

extractor = pipeline(model="google-bert/bert-base-uncased", task="feature-extraction") result = extractor("This is a simple test.", return_tensors=True) result.shape
torch.Size([1, 8, 768])

Learn more about the basics of using a pipeline in the pipeline tutorial

This feature extraction pipeline can currently be loaded from pipeline() using the task identifier:"feature-extraction".

All models may be used for this pipeline. See a list of all models, including community-contributed models onhuggingface.co/models.

( *args **kwargs ) → A nested list of float

Parameters

The features computed by the model.

Extract the features of the input(s).

ImageFeatureExtractionPipeline

( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')] tokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None feature_extractor: typing.Optional[ForwardRef('SequenceFeatureExtractor')] = None image_processor: typing.Optional[transformers.image_processing_utils.BaseImageProcessor] = None processor: typing.Optional[transformers.processing_utils.ProcessorMixin] = None modelcard: typing.Optional[transformers.modelcard.ModelCard] = None framework: typing.Optional[str] = None task: str = '' args_parser: ArgumentHandler = None device: typing.Union[int, ForwardRef('torch.device')] = None torch_dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None binary_output: bool = False **kwargs )

Parameters

Image feature extraction pipeline uses no model head. This pipeline extracts the hidden states from the base transformer, which can be used as features in downstream tasks.

Example:

from transformers import pipeline

extractor = pipeline(model="google/vit-base-patch16-224", task="image-feature-extraction") result = extractor("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", return_tensors=True) result.shape
torch.Size([1, 197, 768])

Learn more about the basics of using a pipeline in the pipeline tutorial

This image feature extraction pipeline can currently be loaded from pipeline() using the task identifier:"image-feature-extraction".

All vision models may be used for this pipeline. See a list of all models, including community-contributed models onhuggingface.co/models.

( *args **kwargs ) → A nested list of float

Parameters

The features computed by the model.

Extract the features of the input(s).

ImageToTextPipeline

class transformers.ImageToTextPipeline

< source >

( *args **kwargs )

Parameters

Image To Text pipeline using a AutoModelForVision2Seq. This pipeline predicts a caption for a given image.

Example:

from transformers import pipeline

captioner = pipeline(model="ydshieh/vit-gpt2-coco-en") captioner("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png") [{'generated_text': 'two birds are standing next to each other '}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This image to text pipeline can currently be loaded from pipeline() using the following task identifier: “image-to-text”.

See the list of available models onhuggingface.co/models.

__call__

< source >

( inputs: typing.Union[str, typing.List[str], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')]] = None **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as a dictionary with the following key:

Assign labels to the image(s) passed as inputs.

ImageTextToTextPipeline

class transformers.ImageTextToTextPipeline

< source >

( *args **kwargs )

Parameters

Image-text-to-text pipeline using an AutoModelForImageTextToText. This pipeline generates text given an image and text. When the underlying model is a conversational model, it can also accept one or more chats, in which case the pipeline will operate in chat mode and will continue the chat(s) by adding its response(s). Each chat takes the form of a list of dicts, where each dict contains “role” and “content” keys.

Example:

from transformers import pipeline

pipe = pipeline(task="image-text-to-text", model="Salesforce/blip-image-captioning-base") pipe("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", text="A photo of") [{'generated_text': 'a photo of two birds'}]

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-interleave-qwen-0.5b-hf") messages = [ { "role": "user", "content": [ { "type": "image", "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", }, {"type": "text", "text": "Describe this image."}, ], }, { "role": "assistant", "content": [ {"type": "text", "text": "There is a dog and"}, ], }, ] pipe(text=messages, max_new_tokens=20, return_full_text=False) [{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, {'type': 'text', 'text': 'Describe this image.'}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'There is a dog and'}]}], 'generated_text': ' a person in the image. The dog is sitting on the sand, and the person is sitting on'}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This image-text to text pipeline can currently be loaded from pipeline() using the following task identifier: “image-text-to-text”.

See the list of available models onhuggingface.co/models.

__call__

< source >

( images: typing.Union[str, typing.List[str], typing.List[typing.List[str]], ForwardRef('Image.Image'), typing.List[ForwardRef('Image.Image')], typing.List[typing.List[ForwardRef('Image.Image')]], NoneType] = None text: typing.Union[str, typing.List[str], typing.List[dict], NoneType] = None **kwargs ) → A list or a list of list of dict

Parameters

Returns

A list or a list of list of dict

Each result comes as a dictionary with the following key (cannot return a combination of both generated_text and generated_token_ids):

Generate a text given text and the image(s) passed as inputs.

MaskGenerationPipeline

class transformers.MaskGenerationPipeline

< source >

( **kwargs )

Parameters

Automatic mask generation for images using SamForMaskGeneration. This pipeline predicts binary masks for an image, given an image. It is a ChunkPipeline because you can separate the points in a mini-batch in order to avoid OOM issues. Use the points_per_batch argument to control the number of points that will be processed at the same time. Default is 64.

The pipeline works in 3 steps:

  1. preprocess: A grid of 1024 points evenly separated is generated along with bounding boxes and point labels. For more details on how the points and bounding boxes are created, check the _generate_crop_boxesfunction. The image is also preprocessed using the image_processor. This function yields a minibatch ofpoints_per_batch.
  2. forward: feeds the outputs of preprocess to the model. The image embedding is computed only once. Calls both self.model.get_image_embeddings and makes sure that the gradients are not computed, and the tensors and models are on the same device.
  3. postprocess: The most important part of the automatic mask generation happens here. Three steps are induced:
    • image_processor.postprocess_masks (run on each minibatch loop): takes in the raw output masks, resizes them according to the image size, and transforms there to binary masks.
    • image_processor.filter_masks (on each minibatch loop): uses both pred_iou_thresh andstability_scores. Also applies a variety of filters based on non maximum suppression to remove bad masks.
    • image_processor.postprocess_masks_for_amg applies the NSM on the mask to only keep relevant ones.

Example:

from transformers import pipeline

generator = pipeline(model="facebook/sam-vit-base", task="mask-generation") outputs = generator( ... "http://images.cocodataset.org/val2017/000000039769.jpg", ... )

outputs = generator( ... "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", points_per_batch=128 ... )

Learn more about the basics of using a pipeline in the pipeline tutorial

This segmentation pipeline can currently be loaded from pipeline() using the following task identifier:"mask-generation".

See the list of available models on huggingface.co/models.

__call__

< source >

( image *args num_workers = None batch_size = None **kwargs ) → Dict

Parameters

A dictionary with the following keys:

Generates binary segmentation masks

VisualQuestionAnsweringPipeline

class transformers.VisualQuestionAnsweringPipeline

< source >

( *args **kwargs )

Parameters

Visual Question Answering pipeline using a AutoModelForVisualQuestionAnswering. This pipeline is currently only available in PyTorch.

Example:

from transformers import pipeline

oracle = pipeline(model="dandelin/vilt-b32-finetuned-vqa") image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/lena.png" oracle(question="What is she wearing ?", image=image_url) [{'score': 0.948, 'answer': 'hat'}, {'score': 0.009, 'answer': 'fedora'}, {'score': 0.003, 'answer': 'clothes'}, {'score': 0.003, 'answer': 'sun hat'}, {'score': 0.002, 'answer': 'nothing'}]

oracle(question="What is she wearing ?", image=image_url, top_k=1) [{'score': 0.948, 'answer': 'hat'}]

oracle(question="Is this a person ?", image=image_url, top_k=1) [{'score': 0.993, 'answer': 'yes'}]

oracle(question="Is this a man ?", image=image_url, top_k=1) [{'score': 0.996, 'answer': 'no'}]

Learn more about the basics of using a pipeline in the pipeline tutorial

This visual question answering pipeline can currently be loaded from pipeline() using the following task identifiers: "visual-question-answering", "vqa".

The models that this pipeline can use are models that have been fine-tuned on a visual question answering task. See the up-to-date list of available models onhuggingface.co/models.

__call__

< source >

( image: typing.Union[ForwardRef('Image.Image'), str, typing.List[ForwardRef('Image.Image')], typing.List[str], ForwardRef('KeyDataset')] question: typing.Union[str, typing.List[str], NoneType] = None **kwargs ) → A dictionary or a list of dictionaries containing the result. The dictionaries contain the following keys

Parameters

Returns

A dictionary or a list of dictionaries containing the result. The dictionaries contain the following keys

Answers open-ended questions about images. The pipeline accepts several types of inputs which are detailed below:

Parent class: Pipeline

class transformers.Pipeline

< source >

( model: typing.Union[ForwardRef('PreTrainedModel'), ForwardRef('TFPreTrainedModel')] tokenizer: typing.Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None feature_extractor: typing.Optional[ForwardRef('SequenceFeatureExtractor')] = None image_processor: typing.Optional[transformers.image_processing_utils.BaseImageProcessor] = None processor: typing.Optional[transformers.processing_utils.ProcessorMixin] = None modelcard: typing.Optional[transformers.modelcard.ModelCard] = None framework: typing.Optional[str] = None task: str = '' args_parser: ArgumentHandler = None device: typing.Union[int, ForwardRef('torch.device')] = None torch_dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None binary_output: bool = False **kwargs )

Parameters

The Pipeline class is the class from which all pipelines inherit. Refer to this class for methods shared across different pipelines.

Base class implementing pipelined operations. Pipeline workflow is defined as a sequence of the following operations:

Input -> Tokenization -> Model Inference -> Post-Processing (task dependent) -> Output

Pipeline supports running on CPU or GPU through the device argument (see below).

Some pipeline, like for instance FeatureExtractionPipeline ('feature-extraction') output large tensor object as nested-lists. In order to avoid dumping such large structure as textual data we provide the binary_outputconstructor argument. If set to True, the output will be stored in the pickle format.

check_model_type

< source >

( supported_models: typing.Union[typing.List[str], dict] )

Parameters

Check if the model class is in supported by the pipeline.

Context Manager allowing tensor allocation on the user-specified device in framework agnostic way.

Examples:

pipe = pipeline(..., device=0) with pipe.device_placement():

output = pipe(...)

ensure_tensor_on_device

< source >

( **inputs ) → Dict[str, torch.Tensor]

Parameters

Returns

Dict[str, torch.Tensor]

The same as inputs but on the proper device.

Ensure PyTorch tensors are on the specified device.

postprocess

< source >

( model_outputs: ModelOutput **postprocess_parameters: typing.Dict )

Postprocess will receive the raw outputs of the _forward method, generally tensors, and reformat them into something more friendly. Generally it will output a list or a dict or results (containing just strings and numbers).

Scikit / Keras interface to transformers’ pipelines. This method will forward to call().

preprocess

< source >

( input_: typing.Any **preprocess_parameters: typing.Dict )

Preprocess will take the input_ of a specific pipeline and return a dictionary of everything necessary for_forward to run properly. It should contain at least one tensor, but might have arbitrary other items.

push_to_hub

< source >

( repo_id: str use_temp_dir: typing.Optional[bool] = None commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Union[bool, str, NoneType] = None max_shard_size: typing.Union[int, str, NoneType] = '5GB' create_pr: bool = False safe_serialization: bool = True revision: typing.Optional[str] = None commit_description: typing.Optional[str] = None tags: typing.Optional[list[str]] = None **deprecated_kwargs )

Parameters

Upload the pipeline file to the 🤗 Model Hub.

Examples:

from transformers import pipeline

pipe = pipeline("google-bert/bert-base-cased")

pipe.push_to_hub("my-finetuned-bert")

pipe.push_to_hub("huggingface/my-finetuned-bert")

save_pretrained

< source >

( save_directory: typing.Union[str, os.PathLike] safe_serialization: bool = True **kwargs )

Parameters

Save the pipeline’s model and tokenizer.

Scikit / Keras interface to transformers’ pipelines. This method will forward to call().

< > Update on GitHub