OPT · Hugging Face (original) (raw)

This model was released on 2022-05-02 and added to Hugging Face Transformers on 2022-05-12.

OPT is a suite of open-source decoder-only pre-trained transformers whose parameters range from 125M to 175B. OPT models are designed for causal language modeling and aim to enable responsible and reproducible research at scale. OPT-175B is comparable in performance to GPT-3 with only 1/7th the carbon footprint.

You can find all the original OPT checkpoints under the OPT collection.

This model was contributed by ArthurZ, ybelkada, and patrickvonplaten.

Click on the OPT models in the right sidebar for more examples of how to apply OPT to different language tasks.

The example below demonstrates how to generate text with Pipeline, AutoModel, and from the command line.

from transformers import pipeline

pipeline = pipeline(task="text-generation", model="facebook/opt-125m", device=0) pipeline("Once upon a time, in a land far, far away,", max_length=50, num_return_sequences=1)

Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the Quantization overview for more available quantization backends.

The example below uses bitsandbytes to quantize the weights to 8-bits.

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_8bit=True) model = AutoModelForCausalLM.from_pretrained("facebook/opt-13b", attn_implementation="sdpa", quantization_config=bnb_config, device_map="auto") tokenizer = AutoTokenizer.from_pretrained("facebook/opt-13b")

prompt = ("Once upon a time, in a land far, far away, ")

model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=30, do_sample=False) tokenizer.batch_decode(generated_ids)[0]

Notes

Resources

OPTConfig

class transformers.OPTConfig

< source >

( transformers_version: str | None = None architectures: list[str] | None = None output_hidden_states: bool | None = False return_dict: bool | None = True dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None chunk_size_feed_forward: int = 0 is_encoder_decoder: bool = False id2label: dict[int, str] | dict[str, str] | None = None label2id: dict[str, int] | dict[str, str] | None = None problem_type: typing.Optional[typing.Literal['regression', 'single_label_classification', 'multi_label_classification']] = None vocab_size: int = 50272 hidden_size: int = 768 num_hidden_layers: int = 12 ffn_dim: int = 3072 max_position_embeddings: int = 2048 do_layer_norm_before: bool = True _remove_final_layer_norm: bool = False word_embed_proj_dim: int | None = None dropout: float | int = 0.1 attention_dropout: float | int = 0.0 num_attention_heads: int = 12 activation_function: str = 'relu' layerdrop: float | int = 0.0 init_std: float = 0.02 use_cache: bool = True pad_token_id: int | None = 1 bos_token_id: int | None = 2 eos_token_id: int | list[int] | None = 2 enable_bias: bool = True layer_norm_elementwise_affine: bool = True tie_word_embeddings: bool = True )

Parameters

This is the configuration class to store the configuration of a OPTModel. It is used to instantiate a Opt model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the facebook/opt-350m

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

Example:

from transformers import OPTConfig, OPTModel

configuration = OPTConfig()

model = OPTModel(configuration)

configuration = model.config

OPTModel

class transformers.OPTModel

< source >

( config: OPTConfig )

Parameters

The bare Opt Model outputting raw hidden-states without any specific head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.FloatTensor | None = None use_cache: bool | None = None position_ids: torch.LongTensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → BaseModelOutputWithPast or tuple(torch.FloatTensor)

Parameters

Returns

BaseModelOutputWithPast or tuple(torch.FloatTensor)

A BaseModelOutputWithPast or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (OPTConfig) and inputs.

The OPTModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

OPTForCausalLM

class transformers.OPTForCausalLM

< source >

( config )

forward

< source >

( input_ids: torch.LongTensor | None = None attention_mask: torch.Tensor | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None use_cache: bool | None = None position_ids: torch.LongTensor | None = None logits_to_keep: int | torch.Tensor = 0 **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → CausalLMOutputWithPast or tuple(torch.FloatTensor)

Parameters

Returns

CausalLMOutputWithPast or tuple(torch.FloatTensor)

A CausalLMOutputWithPast or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (OPTConfig) and inputs.

The OPTForCausalLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, OPTForCausalLM

model = OPTForCausalLM.from_pretrained("facebook/opt-350m") tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")

prompt = "Hey, are you conscious? Can you talk to me?" inputs = tokenizer(prompt, return_tensors="pt")

generate_ids = model.generate(inputs.input_ids, max_length=30) tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] "Hey, are you conscious? Can you talk to me?\nI'm not conscious. I'm just a little bit of a weirdo."

OPTForSequenceClassification

class transformers.OPTForSequenceClassification

< source >

( config: OPTConfig )

Parameters

The OPT Model transformer with a sequence classification head on top (linear layer).

OPTForSequenceClassification uses the last token in order to do the classification, as other causal models (e.g. GPT-2) do.

Since it does classification on the last token, it requires to know the position of the last token. If apad_token_id is defined in the configuration, it finds the last token that is not a padding token in each row. If no pad_token_id is defined, it simply takes the last value in each row of the batch. Since it cannot guess the padding tokens when inputs_embeds are passed instead of input_ids, it does the same (take the last value in each row of the batch).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.FloatTensor | None = None labels: torch.LongTensor | None = None use_cache: bool | None = None position_ids: torch.LongTensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → SequenceClassifierOutputWithPast or tuple(torch.FloatTensor)

Parameters

Returns

SequenceClassifierOutputWithPast or tuple(torch.FloatTensor)

A SequenceClassifierOutputWithPast or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (OPTConfig) and inputs.

The OPTForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification:

import torch from transformers import AutoTokenizer, OPTForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m") model = OPTForSequenceClassification.from_pretrained("facebook/opt-350m")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_id = logits.argmax().item() model.config.id2label[predicted_class_id] ...

num_labels = len(model.config.id2label) model = OPTForSequenceClassification.from_pretrained("facebook/opt-350m", num_labels=num_labels)

labels = torch.tensor([1]) loss = model(**inputs, labels=labels).loss round(loss.item(), 2) ...

Example of multi-label classification:

import torch from transformers import AutoTokenizer, OPTForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m") model = OPTForSequenceClassification.from_pretrained("facebook/opt-350m", problem_type="multi_label_classification")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]

num_labels = len(model.config.id2label) model = OPTForSequenceClassification.from_pretrained( ... "facebook/opt-350m", num_labels=num_labels, problem_type="multi_label_classification" ... )

labels = torch.sum( ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 ... ).to(torch.float) loss = model(**inputs, labels=labels).loss

OPTForQuestionAnswering

class transformers.OPTForQuestionAnswering

< source >

( config: OPTConfig )

Parameters

The Opt transformer with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: torch.LongTensor | None = None attention_mask: torch.FloatTensor | None = None past_key_values: transformers.cache_utils.Cache | None = None inputs_embeds: torch.FloatTensor | None = None start_positions: torch.LongTensor | None = None end_positions: torch.LongTensor | None = None use_cache: bool | None = None position_ids: torch.LongTensor | None = None **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) → QuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Parameters

Returns

QuestionAnsweringModelOutput or tuple(torch.FloatTensor)

A QuestionAnsweringModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (OPTConfig) and inputs.

The OPTForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, OPTForQuestionAnswering import torch

torch.manual_seed(4) tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")

model = OPTForQuestionAnswering.from_pretrained("facebook/opt-350m")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt") with torch.no_grad(): ... outputs = model(**inputs)

answer_start_index = outputs.start_logits.argmax() answer_end_index = outputs.end_logits.argmax()

answer_offset = len(tokenizer(question)[0])

predict_answer_tokens = inputs.input_ids[ ... 0, answer_offset + answer_start_index : answer_offset + answer_end_index + 1 ... ] predicted = tokenizer.decode(predict_answer_tokens) predicted ' a nice puppet'

Update on GitHub