Models (original) (raw)

Generic model classes

class optimum.intel.openvino.modeling_base.OVBaseModel

< source >

( model: Model config: PretrainedConfig = None device: str = 'CPU' dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

Base OVModel class.

from_pretrained

< source >

( model_id: typing.Union[str, pathlib.Path] export: bool = False force_download: bool = False use_auth_token: typing.Union[bool, str, NoneType] = None token: typing.Union[bool, str, NoneType] = None cache_dir: str = '/home/runner/.cache/huggingface/hub' subfolder: str = '' config: typing.Optional[transformers.configuration_utils.PretrainedConfig] = None local_files_only: bool = False trust_remote_code: bool = False revision: typing.Optional[str] = None **kwargs )

Parameters

Instantiate a pretrained model from a pre-trained model configuration.

reshape

< source >

( batch_size: int sequence_length: int height: int = None width: int = None )

Parameters

Propagates the given input shapes on the model’s layers, fixing the inputs shapes of the model.

Natural Language Processing

The following classes are available for the following natural language processing tasks.

OVModelForCausalLM

class optimum.intel.OVModelForCausalLM

< source >

( model: Model config: PretrainedConfig = None device: str = 'CPU' dynamic_shapes: bool = None ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

Parameters

OpenVINO Model with a causal language modeling head on top (linear layer with weights tied to the input embeddings).

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( input_ids: LongTensor attention_mask: typing.Optional[torch.LongTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None position_ids: typing.Optional[torch.LongTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None **kwargs )

generate

< source >

( inputs: typing.Optional[torch.Tensor] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None logits_processor: typing.Optional[transformers.generation.logits_process.LogitsProcessorList] = None stopping_criteria: typing.Optional[transformers.generation.stopping_criteria.StoppingCriteriaList] = None prefix_allowed_tokens_fn: typing.Optional[typing.Callable[[int, torch.Tensor], typing.List[int]]] = None synced_gpus: typing.Optional[bool] = None assistant_model: typing.Optional[ForwardRef('PreTrainedModel')] = None streamer: typing.Optional[ForwardRef('BaseStreamer')] = None negative_prompt_ids: typing.Optional[torch.Tensor] = None negative_prompt_attention_mask: typing.Optional[torch.Tensor] = None **kwargs )

OVModelForMaskedLM

class optimum.intel.OVModelForMaskedLM

< source >

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a MaskedLMOutput for masked language modeling tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of masked language modeling using transformers.pipelines:

from transformers import AutoTokenizer, pipeline from optimum.intel import OVModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("roberta-base") model = OVModelForMaskedLM.from_pretrained("roberta-base", export=True) mask_token = tokenizer.mask_token pipe = pipeline("fill-mask", model=model, tokenizer=tokenizer) outputs = pipe("The goal of life is" + mask_token)

OVModelForSeq2SeqLM

class optimum.intel.OVModelForSeq2SeqLM

< source >

( encoder: Model decoder: Model decoder_with_past: Model = None config: PretrainedConfig = None device: str = 'CPU' dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict] = None **kwargs )

Parameters

Sequence-to-sequence model with a language modeling head for OpenVINO inference.

forward

< source >

( input_ids: LongTensor = None attention_mask: typing.Optional[torch.FloatTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None cache_position: typing.Optional[torch.LongTensor] = None labels: typing.Optional[torch.LongTensor] = None **kwargs )

Parameters

The OVModelForSeq2SeqLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

from transformers import AutoTokenizer from optimum.intel import OVModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("echarlaix/t5-small-openvino") model = OVModelForSeq2SeqLM.from_pretrained("echarlaix/t5-small-openvino") text = "He never went out without a book under his arm, and he often came back with two." inputs = tokenizer(text, return_tensors="pt") gen_tokens = model.generate(**inputs) outputs = tokenizer.batch_decode(gen_tokens)

Example using transformers.pipeline:

from transformers import AutoTokenizer, pipeline from optimum.intel import OVModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("echarlaix/t5-small-openvino") model = OVModelForSeq2SeqLM.from_pretrained("echarlaix/t5-small-openvino") pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer) text = "He never went out without a book under his arm, and he often came back with two." outputs = pipe(text)

OVModelForQuestionAnswering

class optimum.intel.OVModelForQuestionAnswering

< source >

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a QuestionAnsweringModelOutput for extractive question-answering tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of question answering using transformers.pipeline:

from transformers import AutoTokenizer, pipeline from optimum.intel import OVModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad") model = OVModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad", export=True) pipe = pipeline("question-answering", model=model, tokenizer=tokenizer) question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" outputs = pipe(question, text)

OVModelForSequenceClassification

class optimum.intel.OVModelForSequenceClassification

< source >

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of sequence classification using transformers.pipeline:

from transformers import AutoTokenizer, pipeline from optimum.intel import OVModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english") model = OVModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", export=True) pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) outputs = pipe("Hello, my dog is cute")

OVModelForTokenClassification

class optimum.intel.OVModelForTokenClassification

< source >

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a TokenClassifierOutput for token classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of token classification using transformers.pipelines:

from transformers import AutoTokenizer, pipeline from optimum.intel import OVModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER") model = OVModelForTokenClassification.from_pretrained("dslim/bert-base-NER", export=True) pipe = pipeline("token-classification", model=model, tokenizer=tokenizer) outputs = pipe("My Name is Peter and I live in New York.")

Audio

The following classes are available for the following audio tasks.

OVModelForAudioClassification

class optimum.intel.OVModelForAudioClassification

< source >

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a SequenceClassifierOutput for audio classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( input_values: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForAudioClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of audio classification using transformers.pipelines:

from datasets import load_dataset from transformers import AutoFeatureExtractor, pipeline from optimum.intel import OVModelForAudioClassification

preprocessor = AutoFeatureExtractor.from_pretrained("superb/hubert-base-superb-er") model = OVModelForAudioClassification.from_pretrained("superb/hubert-base-superb-er", export=True) pipe = pipeline("audio-classification", model=model, feature_extractor=preprocessor) dataset = load_dataset("superb", "ks", split="test") audio_file = dataset[3]["audio"]["array"] outputs = pipe(audio_file)

OVModelForAudioFrameClassification

class optimum.intel.OVModelForAudioFrameClassification

< source >

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

OpenVINO Model for with a frame classification head on top for tasks like Speaker Diarization.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Audio Frame Classification model for OpenVINO.

forward

< source >

( input_values: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None **kwargs )

Parameters

The OVModelForAudioFrameClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of audio frame classification:

from transformers import AutoFeatureExtractor from optimum.intel import OVModelForAudioFrameClassification from datasets import load_dataset import torch

dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") dataset = dataset.sort("id") sampling_rate = dataset.features["audio"].sampling_rate

feature_extractor = AutoFeatureExtractor.from_pretrained("anton-l/wav2vec2-base-superb-sd") model = OVModelForAudioFrameClassification.from_pretrained("anton-l/wav2vec2-base-superb-sd", export=True)

inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt", sampling_rate=sampling_rate) logits = model(**inputs).logits

probabilities = torch.sigmoid(torch.as_tensor(logits)[0]) labels = (probabilities > 0.5).long() labels[0].tolist()

OVModelForCTC

class optimum.intel.OVModelForCTC

< source >

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

Onnx Model with a language modeling head on top for Connectionist Temporal Classification (CTC).

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

CTC model for OpenVINO.

forward

< source >

( input_values: typing.Optional[torch.Tensor] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForCTC forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of CTC:

from transformers import AutoFeatureExtractor from optimum.intel import OVModelForCTC from datasets import load_dataset

dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") dataset = dataset.sort("id") sampling_rate = dataset.features["audio"].sampling_rate

processor = AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft") model = OVModelForCTC.from_pretrained("facebook/hubert-large-ls960-ft", export=True)

inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="np") logits = model(**inputs).logits predicted_ids = np.argmax(logits, axis=-1)

transcription = processor.batch_decode(predicted_ids)

OVModelForAudioXVector

class optimum.intel.OVModelForAudioXVector

< source >

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

Onnx Model with an XVector feature extraction head on top for tasks like Speaker Verification.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Audio XVector model for OpenVINO.

forward

< source >

( input_values: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None **kwargs )

Parameters

The OVModelForAudioXVector forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of Audio XVector:

from transformers import AutoFeatureExtractor from optimum.intel import OVModelForAudioXVector from datasets import load_dataset import torch

dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") dataset = dataset.sort("id") sampling_rate = dataset.features["audio"].sampling_rate

feature_extractor = AutoFeatureExtractor.from_pretrained("anton-l/wav2vec2-base-superb-sv") model = OVModelForAudioXVector.from_pretrained("anton-l/wav2vec2-base-superb-sv", export=True)

inputs = feature_extractor( ... [d["array"] for d in dataset[:2]["audio"]], sampling_rate=sampling_rate, return_tensors="pt", padding=True ... ) embeddings = model(**inputs).embeddings

embeddings = torch.nn.functional.normalize(embeddings, dim=-1).cpu()

cosine_sim = torch.nn.CosineSimilarity(dim=-1) similarity = cosine_sim(embeddings[0], embeddings[1]) threshold = 0.7 if similarity < threshold: ... print("Speakers are not the same!") round(similarity.item(), 2)

OVModelForSpeechSeq2Seq

class optimum.intel.OVModelForSpeechSeq2Seq

< source >

( encoder: Model decoder: Model decoder_with_past: Model = None config: PretrainedConfig = None device: str = 'CPU' dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict] = None **kwargs )

Parameters

Speech Sequence-to-sequence model with a language modeling head for OpenVINO inference. This class officially supports whisper, speech_to_text.

forward

< source >

( input_features: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.BoolTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None cache_position: typing.Optional[torch.LongTensor] = None **kwargs )

Parameters

The OVModelForSpeechSeq2Seq forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

from transformers import AutoProcessor from optimum.intel import OVModelForSpeechSeq2Seq from datasets import load_dataset

processor = AutoProcessor.from_pretrained("openai/whisper-tiny") model = OVModelForSpeechSeq2Seq.from_pretrained("openai/whisper-tiny")

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") inputs = processor.feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")

gen_tokens = model.generate(inputs=inputs.input_features) outputs = processor.tokenizer.batch_decode(gen_tokens)

Example using transformers.pipeline:

from transformers import AutoProcessor, pipeline from optimum.intel import OVModelForSpeechSeq2Seq from datasets import load_dataset

processor = AutoProcessor.from_pretrained("openai/whisper-tiny") model = OVModelForSpeechSeq2Seq.from_pretrained("openai/whisper-tiny") speech_recognition = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") pred = speech_recognition(ds[0]["audio"]["array"])

Computer Vision

The following classes are available for the following computer vision tasks.

OVModelForImageClassification

class optimum.intel.OVModelForImageClassification

< source >

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a ImageClassifierOutput for image classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

< source >

( pixel_values: typing.Union[torch.Tensor, numpy.ndarray] **kwargs )

Parameters

The OVModelForImageClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of image classification using transformers.pipelines:

from transformers import AutoFeatureExtractor, pipeline from optimum.intel import OVModelForImageClassification

preprocessor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224") model = OVModelForImageClassification.from_pretrained("google/vit-base-patch16-224", export=True) model.reshape(batch_size=1, sequence_length=3, height=224, width=224) pipe = pipeline("image-classification", model=model, feature_extractor=preprocessor) url = "http://images.cocodataset.org/val2017/000000039769.jpg" outputs = pipe(url)

This class can also be used with [timm](https://github.com/huggingface/pytorch-image-models)

models hosted on HuggingFaceHub. Example:

from transformers import pipeline from optimum.intel.openvino.modeling_timm import TimmImageProcessor from optimum.intel import OVModelForImageClassification

model_id = "timm/vit_tiny_patch16_224.augreg_in21k" preprocessor = TimmImageProcessor.from_pretrained(model_id) model = OVModelForImageClassification.from_pretrained(model_id, export=True) pipe = pipeline("image-classification", model=model, feature_extractor=preprocessor) url = "http://images.cocodataset.org/val2017/000000039769.jpg" outputs = pipe(url)

Multimodal

The following classes are available for the following multimodal tasks.

OVModelForVision2Seq

class optimum.intel.OVModelForVision2Seq

< source >

( encoder: Model decoder: Model decoder_with_past: Model = None config: PretrainedConfig = None **kwargs )

Parameters

VisionEncoderDecoder Sequence-to-sequence model with a language modeling head for OpenVINO inference.

forward

< source >

( pixel_values: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.BoolTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None **kwargs )

Parameters

The OVModelForVision2Seq forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

from transformers import AutoProcessor, AutoTokenizer from optimum.intel import OVModelForVision2Seq from PIL import Image import requests

processor = AutoProcessor.from_pretrained("microsoft/trocr-small-handwritten") tokenizer = AutoTokenizer.from_pretrained("microsoft/trocr-small-handwritten") model = OVModelForVision2Seq.from_pretrained("microsoft/trocr-small-handwritten", export=True)

url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = processor(image, return_tensors="pt")

gen_tokens = model.generate(**inputs) outputs = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

Example using transformers.pipeline:

from transformers import AutoProcessor, AutoTokenizer, pipeline from optimum.intel import OVModelForVision2Seq from PIL import Image import requests

processor = AutoProcessor.from_pretrained("microsoft/trocr-small-handwritten") tokenizer = AutoTokenizer.from_pretrained("microsoft/trocr-small-handwritten") model = OVModelForVision2Seq.from_pretrained("microsoft/trocr-small-handwritten", export=True)

url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg" image = Image.open(requests.get(url, stream=True).raw)

image_to_text = pipeline("image-to-text", model=model, tokenizer=tokenizer, feature_extractor=processor, image_processor=processor) pred = image_to_text(image)

OVModelForPix2Struct

class optimum.intel.OVModelForPix2Struct

< source >

( encoder: Model decoder: Model decoder_with_past: Model = None config: PretrainedConfig = None device: str = 'CPU' dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict] = None **kwargs )

Parameters

Pix2Struct model with a language modeling head for OpenVINO inference.

forward

< source >

( flattened_patches: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.BoolTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None **kwargs )

Parameters

The OVModelForPix2Struct forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of pix2struct:

from transformers import AutoProcessor from optimum.intel import OVModelForPix2Struct from PIL import Image import requests

processor = AutoProcessor.from_pretrained("google/pix2struct-ai2d-base") model = OVModelForPix2Struct.from_pretrained("google/pix2struct-ai2d-base", export=True)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" image = Image.open(requests.get(url, stream=True).raw) question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud" inputs = processor(images=image, text=question, return_tensors="pt")

gen_tokens = model.generate(**inputs) outputs = processor.batch_decode(gen_tokens, skip_special_tokens=True)

Custom Tasks

OVModelForCustomTasks

class optimum.intel.OVModelForCustomTasks

< source >

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

OpenVINO Model for custom tasks. It can be used to leverage the inference acceleration for any single-file OpenVINO model, that may use custom inputs and outputs.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

The OVModelForCustomTasks forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of custom tasks (e.g. a sentence transformers with a pooler head):

from transformers import AutoTokenizer from optimum.intel import OVModelForCustomTasks

tokenizer = AutoTokenizer.from_pretrained("IlyasMoutawwakil/sbert-all-MiniLM-L6-v2-with-pooler") model = OVModelForCustomTasks.from_pretrained("IlyasMoutawwakil/sbert-all-MiniLM-L6-v2-with-pooler")

inputs = tokenizer("I love burritos!", return_tensors="np")

outputs = model(**inputs) last_hidden_state = outputs.last_hidden_state pooler_output = outputs.pooler_output

OVModelForFeatureExtraction

( model = None config = None **kwargs )

Parameters

OpenVINO Model with a BaseModelOutput for feature extraction tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

The OVModelForFeatureExtraction forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of feature extraction using transformers.pipelines:

from transformers import AutoTokenizer, pipeline from optimum.intel import OVModelForFeatureExtraction

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") model = OVModelForFeatureExtraction.from_pretrained("sentence-transformers/all-MiniLM-L6-v2", export=True) pipe = pipeline("feature-extraction", model=model, tokenizer=tokenizer) outputs = pipe("My Name is Peter and I live in New York.")

Text-to-image

OVStableDiffusionPipeline

class optimum.intel.OVStableDiffusionPipeline

< source >

( scheduler: SchedulerMixin unet: typing.Optional[openvino._ov_api.Model] = None vae_decoder: typing.Optional[openvino._ov_api.Model] = None vae_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder_2: typing.Optional[openvino._ov_api.Model] = None text_encoder_3: typing.Optional[openvino._ov_api.Model] = None transformer: typing.Optional[openvino._ov_api.Model] = None tokenizer: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_3: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None force_zeros_for_empty_prompt: bool = True requires_aesthetics_score: bool = False add_watermarker: typing.Optional[bool] = None device: str = 'CPU' compile: bool = True compile_only: bool = False dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

OpenVINO-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionPipeline.

OVStableDiffusionXLPipeline

class optimum.intel.OVStableDiffusionXLPipeline

< source >

( scheduler: SchedulerMixin unet: typing.Optional[openvino._ov_api.Model] = None vae_decoder: typing.Optional[openvino._ov_api.Model] = None vae_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder_2: typing.Optional[openvino._ov_api.Model] = None text_encoder_3: typing.Optional[openvino._ov_api.Model] = None transformer: typing.Optional[openvino._ov_api.Model] = None tokenizer: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_3: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None force_zeros_for_empty_prompt: bool = True requires_aesthetics_score: bool = False add_watermarker: typing.Optional[bool] = None device: str = 'CPU' compile: bool = True compile_only: bool = False dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

OpenVINO-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionXLPipeline.

OVLatentConsistencyModelPipeline

class optimum.intel.OVLatentConsistencyModelPipeline

< source >

( scheduler: SchedulerMixin unet: typing.Optional[openvino._ov_api.Model] = None vae_decoder: typing.Optional[openvino._ov_api.Model] = None vae_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder_2: typing.Optional[openvino._ov_api.Model] = None text_encoder_3: typing.Optional[openvino._ov_api.Model] = None transformer: typing.Optional[openvino._ov_api.Model] = None tokenizer: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_3: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None force_zeros_for_empty_prompt: bool = True requires_aesthetics_score: bool = False add_watermarker: typing.Optional[bool] = None device: str = 'CPU' compile: bool = True compile_only: bool = False dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

OpenVINO-powered stable diffusion pipeline corresponding to diffusers.LatentConsistencyModelPipeline.

Image-to-image

OVStableDiffusionImg2ImgPipeline

class optimum.intel.OVStableDiffusionImg2ImgPipeline

< source >

( scheduler: SchedulerMixin unet: typing.Optional[openvino._ov_api.Model] = None vae_decoder: typing.Optional[openvino._ov_api.Model] = None vae_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder_2: typing.Optional[openvino._ov_api.Model] = None text_encoder_3: typing.Optional[openvino._ov_api.Model] = None transformer: typing.Optional[openvino._ov_api.Model] = None tokenizer: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_3: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None force_zeros_for_empty_prompt: bool = True requires_aesthetics_score: bool = False add_watermarker: typing.Optional[bool] = None device: str = 'CPU' compile: bool = True compile_only: bool = False dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

OpenVINO-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionImg2ImgPipeline.

OVStableDiffusionXLImg2ImgPipeline

class optimum.intel.OVStableDiffusionXLImg2ImgPipeline

< source >

( scheduler: SchedulerMixin unet: typing.Optional[openvino._ov_api.Model] = None vae_decoder: typing.Optional[openvino._ov_api.Model] = None vae_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder_2: typing.Optional[openvino._ov_api.Model] = None text_encoder_3: typing.Optional[openvino._ov_api.Model] = None transformer: typing.Optional[openvino._ov_api.Model] = None tokenizer: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_3: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None force_zeros_for_empty_prompt: bool = True requires_aesthetics_score: bool = False add_watermarker: typing.Optional[bool] = None device: str = 'CPU' compile: bool = True compile_only: bool = False dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

OpenVINO-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionXLImg2ImgPipeline.

Inpainting

OVStableDiffusionInpaintPipeline

class optimum.intel.OVStableDiffusionInpaintPipeline

< source >

( scheduler: SchedulerMixin unet: typing.Optional[openvino._ov_api.Model] = None vae_decoder: typing.Optional[openvino._ov_api.Model] = None vae_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder: typing.Optional[openvino._ov_api.Model] = None text_encoder_2: typing.Optional[openvino._ov_api.Model] = None text_encoder_3: typing.Optional[openvino._ov_api.Model] = None transformer: typing.Optional[openvino._ov_api.Model] = None tokenizer: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None tokenizer_3: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None force_zeros_for_empty_prompt: bool = True requires_aesthetics_score: bool = False add_watermarker: typing.Optional[bool] = None device: str = 'CPU' compile: bool = True compile_only: bool = False dynamic_shapes: bool = True ov_config: typing.Optional[typing.Dict[str, str]] = None model_save_dir: typing.Union[str, pathlib.Path, optimum.intel.openvino.utils.TemporaryDirectory, NoneType] = None quantization_config: typing.Union[optimum.intel.openvino.configuration.OVWeightQuantizationConfig, typing.Dict, NoneType] = None **kwargs )

OpenVINO-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionInpaintPipeline.