RoBERTa-PreLayerNorm (original) (raw)

The RoBERTa-PreLayerNorm model was proposed in fairseq: A Fast, Extensible Toolkit for Sequence Modeling by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. It is identical to using the --encoder-normalize-before flag in fairseq.

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs.

class transformers.RobertaPreLayerNormConfig

< source >

( vocab_size = 50265 hidden_size = 768 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 pad_token_id = 1 bos_token_id = 0 eos_token_id = 2 position_embedding_type = 'absolute' use_cache = True classifier_dropout = None **kwargs )

Parameters

This is the configuration class to store the configuration of a RobertaPreLayerNormModel or a TFRobertaPreLayerNormModel. It is used to instantiate a RoBERTa-PreLayerNorm model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the RoBERTa-PreLayerNormandreasmadsen/efficient_mlm_m0.40 architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

from transformers import RobertaPreLayerNormConfig, RobertaPreLayerNormModel

configuration = RobertaPreLayerNormConfig()

model = RobertaPreLayerNormModel(configuration)

configuration = model.config

RobertaPreLayerNormModel

class transformers.RobertaPreLayerNormModel

< source >

( config add_pooling_layer = True )

Parameters

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need_ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument andadd_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

.. _ Attention is all you need: https://arxiv.org/abs/1706.03762

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.Tensor] = None encoder_attention_mask: typing.Optional[torch.Tensor] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

RobertaPreLayerNormForCausalLM

class transformers.RobertaPreLayerNormForCausalLM

< source >

( config )

Parameters

RoBERTa-PreLayerNorm Model with a language modeling head on top for CLM fine-tuning.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None position_ids: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[torch.FloatTensor] = None encoder_attention_mask: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None **kwargs ) → transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormForCausalLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, RobertaPreLayerNormForCausalLM, AutoConfig import torch

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") config = AutoConfig.from_pretrained("andreasmadsen/efficient_mlm_m0.40") config.is_decoder = True model = RobertaPreLayerNormForCausalLM.from_pretrained("andreasmadsen/efficient_mlm_m0.40", config=config)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs)

prediction_logits = outputs.logits

RobertaPreLayerNormForMaskedLM

class transformers.RobertaPreLayerNormForMaskedLM

< source >

( config )

Parameters

RoBERTa-PreLayerNorm Model with a language modeling head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None position_ids: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None encoder_hidden_states: typing.Optional[torch.FloatTensor] = None encoder_attention_mask: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.MaskedLMOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.MaskedLMOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, RobertaPreLayerNormForMaskedLM import torch

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = RobertaPreLayerNormForMaskedLM.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("The capital of France is .", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1) tokenizer.decode(predicted_token_id) ...

labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]

labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels) round(outputs.loss.item(), 2) ...

RobertaPreLayerNormForSequenceClassification

class transformers.RobertaPreLayerNormForSequenceClassification

< source >

( config )

Parameters

RoBERTa-PreLayerNorm Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None position_ids: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.SequenceClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.SequenceClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification:

import torch from transformers import AutoTokenizer, RobertaPreLayerNormForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = RobertaPreLayerNormForSequenceClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_id = logits.argmax().item() model.config.id2label[predicted_class_id] ...

num_labels = len(model.config.id2label) model = RobertaPreLayerNormForSequenceClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40", num_labels=num_labels)

labels = torch.tensor([1]) loss = model(**inputs, labels=labels).loss round(loss.item(), 2) ...

Example of multi-label classification:

import torch from transformers import AutoTokenizer, RobertaPreLayerNormForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = RobertaPreLayerNormForSequenceClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40", problem_type="multi_label_classification")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]

num_labels = len(model.config.id2label) model = RobertaPreLayerNormForSequenceClassification.from_pretrained( ... "andreasmadsen/efficient_mlm_m0.40", num_labels=num_labels, problem_type="multi_label_classification" ... )

labels = torch.sum( ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 ... ).to(torch.float) loss = model(**inputs, labels=labels).loss

RobertaPreLayerNormForMultipleChoice

class transformers.RobertaPreLayerNormForMultipleChoice

< source >

( config )

Parameters

The Roberta Prelayernorm Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None position_ids: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.MultipleChoiceModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.MultipleChoiceModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, RobertaPreLayerNormForMultipleChoice import torch

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = RobertaPreLayerNormForMultipleChoice.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." choice0 = "It is eaten with a fork and a knife." choice1 = "It is eaten while held in the hand." labels = torch.tensor(0).unsqueeze(0)

encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="pt", padding=True) outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels)

loss = outputs.loss logits = outputs.logits

RobertaPreLayerNormForTokenClassification

class transformers.RobertaPreLayerNormForTokenClassification

< source >

( config )

Parameters

The Roberta Prelayernorm transformer with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None position_ids: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.TokenClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.TokenClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, RobertaPreLayerNormForTokenClassification import torch

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = RobertaPreLayerNormForTokenClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer( ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt" ... )

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_token_class_ids = logits.argmax(-1)

predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]] predicted_tokens_classes ...

labels = predicted_token_class_ids loss = model(**inputs, labels=labels).loss round(loss.item(), 2) ...

RobertaPreLayerNormForQuestionAnswering

class transformers.RobertaPreLayerNormForQuestionAnswering

< source >

( config )

Parameters

The Roberta Prelayernorm transformer with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None token_type_ids: typing.Optional[torch.LongTensor] = None position_ids: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.FloatTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None start_positions: typing.Optional[torch.LongTensor] = None end_positions: typing.Optional[torch.LongTensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.QuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.QuestionAnsweringModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The RobertaPreLayerNormForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, RobertaPreLayerNormForQuestionAnswering import torch

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = RobertaPreLayerNormForQuestionAnswering.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt") with torch.no_grad(): ... outputs = model(**inputs)

answer_start_index = outputs.start_logits.argmax() answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens, skip_special_tokens=True) ...

target_start_index = torch.tensor([14]) target_end_index = torch.tensor([15])

outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) loss = outputs.loss round(loss.item(), 2) ...

TFRobertaPreLayerNormModel

class transformers.TFRobertaPreLayerNormModel

< source >

( config *inputs **kwargs )

Parameters

The bare RoBERTa-PreLayerNorm Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None encoder_hidden_states: np.ndarray | tf.Tensor | None = None encoder_attention_mask: np.ndarray | tf.Tensor | None = None past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormModel import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormModel.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") outputs = model(inputs)

last_hidden_states = outputs.last_hidden_state

TFRobertaPreLayerNormForCausalLM

class transformers.TFRobertaPreLayerNormForCausalLM

< source >

( config: RobertaPreLayerNormConfig *inputs **kwargs )

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None encoder_hidden_states: np.ndarray | tf.Tensor | None = None encoder_attention_mask: np.ndarray | tf.Tensor | None = None past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormForCausalLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormForCausalLM import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormForCausalLM.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") outputs = model(inputs) logits = outputs.logits

TFRobertaPreLayerNormForMaskedLM

class transformers.TFRobertaPreLayerNormForMaskedLM

< source >

( config *inputs **kwargs )

Parameters

RoBERTa-PreLayerNorm Model with a language modeling head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFMaskedLMOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFMaskedLMOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormForMaskedLM import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormForMaskedLM.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("The capital of France is .", return_tensors="tf") logits = model(**inputs).logits

mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0]) selected_logits = tf.gather_nd(logits[0], indices=mask_token_index)

predicted_token_id = tf.math.argmax(selected_logits, axis=-1) tokenizer.decode(predicted_token_id) ' Paris'

labels = tokenizer("The capital of France is Paris.", return_tensors="tf")["input_ids"]

labels = tf.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels) round(float(outputs.loss), 2) 0.69

TFRobertaPreLayerNormForSequenceClassification

class transformers.TFRobertaPreLayerNormForSequenceClassification

< source >

( config *inputs **kwargs )

Parameters

RoBERTa-PreLayerNorm Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFSequenceClassifierOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFSequenceClassifierOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormForSequenceClassification import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormForSequenceClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")

logits = model(**inputs).logits

predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])

num_labels = len(model.config.id2label) model = TFRobertaPreLayerNormForSequenceClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40", num_labels=num_labels)

labels = tf.constant(1) loss = model(**inputs, labels=labels).loss

TFRobertaPreLayerNormForMultipleChoice

class transformers.TFRobertaPreLayerNormForMultipleChoice

< source >

( config *inputs **kwargs )

Parameters

RobertaPreLayerNorm Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormForMultipleChoice import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormForMultipleChoice.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." choice0 = "It is eaten with a fork and a knife." choice1 = "It is eaten while held in the hand."

encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="tf", padding=True) inputs = {k: tf.expand_dims(v, 0) for k, v in encoding.items()} outputs = model(inputs)

logits = outputs.logits

TFRobertaPreLayerNormForTokenClassification

class transformers.TFRobertaPreLayerNormForTokenClassification

< source >

( config *inputs **kwargs )

Parameters

RoBERTa-PreLayerNorm Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFTokenClassifierOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFTokenClassifierOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormForTokenClassification import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormForTokenClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer( ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="tf" ... )

logits = model(**inputs).logits predicted_token_class_ids = tf.math.argmax(logits, axis=-1)

predicted_tokens_classes = [model.config.id2label[t] for t in predicted_token_class_ids[0].numpy().tolist()]

labels = predicted_token_class_ids loss = tf.math.reduce_mean(model(**inputs, labels=labels).loss)

TFRobertaPreLayerNormForQuestionAnswering

class transformers.TFRobertaPreLayerNormForQuestionAnswering

< source >

( config *inputs **kwargs )

Parameters

RoBERTa-PreLayerNorm Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None start_positions: np.ndarray | tf.Tensor | None = None end_positions: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The TFRobertaPreLayerNormForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFRobertaPreLayerNormForQuestionAnswering import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = TFRobertaPreLayerNormForQuestionAnswering.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf") outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0]) answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]

target_start_index = tf.constant([14]) target_end_index = tf.constant([15])

outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) loss = tf.math.reduce_mean(outputs.loss)

FlaxRobertaPreLayerNormModel

class transformers.FlaxRobertaPreLayerNormModel

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

The bare RoBERTa-PreLayerNorm Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormModel

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormModel.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax") outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

FlaxRobertaPreLayerNormForCausalLM

class transformers.FlaxRobertaPreLayerNormForCausalLM

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

RobertaPreLayerNorm Model with a language modeling head on top (a linear layer on top of the hidden-states output) e.g for autoregressive tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormForCausalLM

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormForCausalLM.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="np") outputs = model(**inputs)

next_token_logits = outputs.logits[:, -1]

FlaxRobertaPreLayerNormForMaskedLM

class transformers.FlaxRobertaPreLayerNormForMaskedLM

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

RoBERTa-PreLayerNorm Model with a language modeling head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormForMaskedLM.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="jax")

outputs = model(**inputs) logits = outputs.logits

FlaxRobertaPreLayerNormForSequenceClassification

class transformers.FlaxRobertaPreLayerNormForSequenceClassification

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

RobertaPreLayerNorm Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormForSequenceClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")

outputs = model(**inputs) logits = outputs.logits

FlaxRobertaPreLayerNormForMultipleChoice

class transformers.FlaxRobertaPreLayerNormForMultipleChoice

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

RobertaPreLayerNorm Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormForMultipleChoice

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormForMultipleChoice.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." choice0 = "It is eaten with a fork and a knife." choice1 = "It is eaten while held in the hand."

encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="jax", padding=True) outputs = model(**{k: v[None, :] for k, v in encoding.items()})

logits = outputs.logits

FlaxRobertaPreLayerNormForTokenClassification

class transformers.FlaxRobertaPreLayerNormForTokenClassification

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

RobertaPreLayerNorm Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxTokenClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxTokenClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormForTokenClassification.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")

outputs = model(**inputs) logits = outputs.logits

FlaxRobertaPreLayerNormForQuestionAnswering

class transformers.FlaxRobertaPreLayerNormForQuestionAnswering

< source >

( config: RobertaPreLayerNormConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

RobertaPreLayerNorm Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f69ffb94700> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (RobertaPreLayerNormConfig) and inputs.

The FlaxRobertaPreLayerNormPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxRobertaPreLayerNormForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("andreasmadsen/efficient_mlm_m0.40") model = FlaxRobertaPreLayerNormForQuestionAnswering.from_pretrained("andreasmadsen/efficient_mlm_m0.40")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="jax")

outputs = model(**inputs) start_scores = outputs.start_logits end_scores = outputs.end_logits