BERT (original) (raw)

BERT is a bidirectional transformer pretrained on unlabeled text to predict masked tokens in a sentence and to predict whether one sentence follows another. The main idea is that by randomly masking some tokens, the model can train on text to the left and right, giving it a more thorough understanding. BERT is also very versatile because its learned language representations can be adapted for other NLP tasks by fine-tuning an additional layer or head.

You can find all the original BERT checkpoints under the BERT collection.

Click on the BERT models in the right sidebar for more examples of how to apply BERT to different language tasks.

The example below demonstrates how to predict the [MASK] token with Pipeline, AutoModel, and from the command line.

Pipeline

AutoModel

transformers CLI

import torch from transformers import pipeline

pipeline = pipeline( task="fill-mask", model="google-bert/bert-base-uncased", torch_dtype=torch.float16, device=0 ) pipeline("Plants create [MASK] through a process known as photosynthesis.")

Notes

BertConfig

class transformers.BertConfig

< source >

( vocab_size = 30522 hidden_size = 768 num_hidden_layers = 12 num_attention_heads = 12 intermediate_size = 3072 hidden_act = 'gelu' hidden_dropout_prob = 0.1 attention_probs_dropout_prob = 0.1 max_position_embeddings = 512 type_vocab_size = 2 initializer_range = 0.02 layer_norm_eps = 1e-12 pad_token_id = 0 position_embedding_type = 'absolute' use_cache = True classifier_dropout = None **kwargs )

Parameters

This is the configuration class to store the configuration of a BertModel or a TFBertModel. It is used to instantiate a BERT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BERTgoogle-bert/bert-base-uncased architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Examples:

from transformers import BertConfig, BertModel

configuration = BertConfig()

model = BertModel(configuration)

configuration = model.config

BertTokenizer

class transformers.BertTokenizer

< source >

( vocab_file do_lower_case = True do_basic_tokenize = True never_split = None unk_token = '[UNK]' sep_token = '[SEP]' pad_token = '[PAD]' cls_token = '[CLS]' mask_token = '[MASK]' tokenize_chinese_chars = True strip_accents = None clean_up_tokenization_spaces = True **kwargs )

Parameters

Construct a BERT tokenizer. Based on WordPiece.

This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

build_inputs_with_special_tokens

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

Parameters

List of input IDs with the appropriate special tokens.

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

get_special_tokens_mask

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None already_has_special_tokens: bool = False ) → List[int]

Parameters

A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

create_token_type_ids_from_sequences

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

Parameters

List of token type IDs according to the given sequence(s).

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence

pair mask has the following format:

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

save_vocabulary

< source >

( save_directory: str filename_prefix: typing.Optional[str] = None )

BertTokenizerFast

class transformers.BertTokenizerFast

< source >

( vocab_file = None tokenizer_file = None do_lower_case = True unk_token = '[UNK]' sep_token = '[SEP]' pad_token = '[PAD]' cls_token = '[CLS]' mask_token = '[MASK]' tokenize_chinese_chars = True strip_accents = None **kwargs )

Parameters

Construct a “fast” BERT tokenizer (backed by HuggingFace’s tokenizers library). Based on WordPiece.

This tokenizer inherits from PreTrainedTokenizerFast which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

build_inputs_with_special_tokens

< source >

( token_ids_0 token_ids_1 = None ) → List[int]

Parameters

List of input IDs with the appropriate special tokens.

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A BERT sequence has the following format:

create_token_type_ids_from_sequences

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

Parameters

List of token type IDs according to the given sequence(s).

Create a mask from the two sequences passed to be used in a sequence-pair classification task. A BERT sequence

pair mask has the following format:

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 | first sequence | second sequence |

If token_ids_1 is None, this method only returns the first portion of the mask (0s).

BertModel

class transformers.BertModel

< source >

( config add_pooling_layer = True )

Parameters

The bare Bert Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in Attention is all you need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument andadd_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.Tensor] = None encoder_attention_mask: typing.Optional[torch.Tensor] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertModel import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertModel.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

BertForPreTraining

class transformers.BertForPreTraining

< source >

( config )

Parameters

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None next_sentence_label: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.bert.modeling_bert.BertForPreTrainingOutput or tuple(torch.FloatTensor)

Parameters

A transformers.models.bert.modeling_bert.BertForPreTrainingOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForPreTraining forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertForPreTraining import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertForPreTraining.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs)

prediction_logits = outputs.prediction_logits seq_relationship_logits = outputs.seq_relationship_logits

BertLMHeadModel

class transformers.BertLMHeadModel

< source >

( config )

Parameters

Bert Model with a language modeling head on top for CLM fine-tuning.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.Tensor] = None encoder_attention_mask: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None past_key_values: typing.Optional[typing.List[torch.Tensor]] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None **loss_kwargs ) → transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertLMHeadModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

import torch from transformers import AutoTokenizer, BertLMHeadModel

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertLMHeadModel.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs, labels=inputs["input_ids"]) loss = outputs.loss logits = outputs.logits

BertForMaskedLM

class transformers.BertForMaskedLM

< source >

( config )

Parameters

Bert Model with a language modeling head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.Tensor] = None encoder_attention_mask: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.MaskedLMOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.MaskedLMOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertForMaskedLM import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertForMaskedLM.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1) tokenizer.decode(predicted_token_id) 'paris'

labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]

labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels) round(outputs.loss.item(), 2) 0.88

BertForNextSentencePrediction

class transformers.BertForNextSentencePrediction

< source >

( config )

Parameters

Bert Model with a next sentence prediction (classification) head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None **kwargs ) → transformers.modeling_outputs.NextSentencePredictorOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.NextSentencePredictorOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForNextSentencePrediction forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertForNextSentencePrediction import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." next_sentence = "The sky is blue due to the shorter wavelength of blue light." encoding = tokenizer(prompt, next_sentence, return_tensors="pt")

outputs = model(**encoding, labels=torch.LongTensor([1])) logits = outputs.logits assert logits[0, 0] < logits[0, 1]

BertForSequenceClassification

class transformers.BertForSequenceClassification

< source >

( config )

Parameters

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.SequenceClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.SequenceClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification:

import torch from transformers import AutoTokenizer, BertForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("textattack/bert-base-uncased-yelp-polarity") model = BertForSequenceClassification.from_pretrained("textattack/bert-base-uncased-yelp-polarity")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_id = logits.argmax().item() model.config.id2label[predicted_class_id] 'LABEL_1'

num_labels = len(model.config.id2label) model = BertForSequenceClassification.from_pretrained("textattack/bert-base-uncased-yelp-polarity", num_labels=num_labels)

labels = torch.tensor([1]) loss = model(**inputs, labels=labels).loss round(loss.item(), 2) 0.01

Example of multi-label classification:

import torch from transformers import AutoTokenizer, BertForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("textattack/bert-base-uncased-yelp-polarity") model = BertForSequenceClassification.from_pretrained("textattack/bert-base-uncased-yelp-polarity", problem_type="multi_label_classification")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]

num_labels = len(model.config.id2label) model = BertForSequenceClassification.from_pretrained( ... "textattack/bert-base-uncased-yelp-polarity", num_labels=num_labels, problem_type="multi_label_classification" ... )

labels = torch.sum( ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 ... ).to(torch.float) loss = model(**inputs, labels=labels).loss

BertForMultipleChoice

class transformers.BertForMultipleChoice

< source >

( config )

Parameters

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.MultipleChoiceModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.MultipleChoiceModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertForMultipleChoice import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = BertForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." choice0 = "It is eaten with a fork and a knife." choice1 = "It is eaten while held in the hand." labels = torch.tensor(0).unsqueeze(0)

encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="pt", padding=True) outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels)

loss = outputs.loss logits = outputs.logits

BertForTokenClassification

class transformers.BertForTokenClassification

< source >

( config )

Parameters

Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None labels: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.TokenClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.TokenClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertForTokenClassification import torch

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english") model = BertForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")

inputs = tokenizer( ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt" ... )

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_token_class_ids = logits.argmax(-1)

predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]] predicted_tokens_classes ['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', 'I-LOC']

labels = predicted_token_class_ids loss = model(**inputs, labels=labels).loss round(loss.item(), 2) 0.01

BertForQuestionAnswering

class transformers.BertForQuestionAnswering

< source >

( config )

Parameters

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None token_type_ids: typing.Optional[torch.Tensor] = None position_ids: typing.Optional[torch.Tensor] = None head_mask: typing.Optional[torch.Tensor] = None inputs_embeds: typing.Optional[torch.Tensor] = None start_positions: typing.Optional[torch.Tensor] = None end_positions: typing.Optional[torch.Tensor] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.QuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.QuestionAnsweringModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The BertForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, BertForQuestionAnswering import torch

tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2") model = BertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt") with torch.no_grad(): ... outputs = model(**inputs)

answer_start_index = outputs.start_logits.argmax() answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens, skip_special_tokens=True) 'a nice puppet'

target_start_index = torch.tensor([14]) target_end_index = torch.tensor([15])

outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) loss = outputs.loss round(loss.item(), 2) 7.41

TFBertTokenizer

class transformers.TFBertTokenizer

< source >

( vocab_list: typing.List do_lower_case: bool cls_token_id: typing.Optional[int] = None sep_token_id: typing.Optional[int] = None pad_token_id: typing.Optional[int] = None padding: str = 'longest' truncation: bool = True max_length: int = 512 pad_to_multiple_of: typing.Optional[int] = None return_token_type_ids: bool = True return_attention_mask: bool = True use_fast_bert_tokenizer: bool = True **tokenizer_kwargs )

Parameters

This is an in-graph tokenizer for BERT. It should be initialized similarly to other tokenizers, using thefrom_pretrained() method. It can also be initialized with the from_tokenizer() method, which imports settings from an existing standard tokenizer object.

In-graph tokenizers, unlike other Hugging Face tokenizers, are actually Keras layers and are designed to be run when the model is called, rather than during preprocessing. As a result, they have somewhat more limited options than standard tokenizer classes. They are most useful when you want to create an end-to-end model that goes straight from tf.string inputs to outputs.

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] *init_inputs **kwargs )

Parameters

Instantiate a TFBertTokenizer from a pre-trained tokenizer.

Examples:

from transformers import TFBertTokenizer

tf_tokenizer = TFBertTokenizer.from_pretrained("google-bert/bert-base-uncased")

from_tokenizer

< source >

( tokenizer: PreTrainedTokenizerBase **kwargs )

Parameters

Initialize a TFBertTokenizer from an existing Tokenizer.

Examples:

from transformers import AutoTokenizer, TFBertTokenizer

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") tf_tokenizer = TFBertTokenizer.from_tokenizer(tokenizer)

TFBertModel

class transformers.TFBertModel

< source >

( config: BertConfig add_pooling_layer: bool = True *inputs **kwargs )

Parameters

The bare Bert Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None encoder_hidden_states: np.ndarray | tf.Tensor | None = None encoder_attention_mask: np.ndarray | tf.Tensor | None = None past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFBertModel import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = TFBertModel.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") outputs = model(inputs)

last_hidden_states = outputs.last_hidden_state

TFBertForPreTraining

class transformers.TFBertForPreTraining

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None next_sentence_label: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.models.bert.modeling_tf_bert.TFBertForPreTrainingOutput or tuple(tf.Tensor)

Parameters

A transformers.models.bert.modeling_tf_bert.TFBertForPreTrainingOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForPreTraining forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

import tensorflow as tf from transformers import AutoTokenizer, TFBertForPreTraining

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = TFBertForPreTraining.from_pretrained("google-bert/bert-base-uncased") input_ids = tokenizer("Hello, my dog is cute", add_special_tokens=True, return_tensors="tf")

outputs = model(input_ids) prediction_logits, seq_relationship_logits = outputs[:2]

TFBertModelLMHeadModel

class transformers.TFBertLMHeadModel

< source >

( config: BertConfig *inputs **kwargs )

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None encoder_hidden_states: np.ndarray | tf.Tensor | None = None encoder_attention_mask: np.ndarray | tf.Tensor | None = None past_key_values: Optional[Tuple[Tuple[Union[np.ndarray, tf.Tensor]]]] = None use_cache: Optional[bool] = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False **kwargs ) → transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions or tuple(tf.Tensor)

A transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

encoder_hidden_states (tf.Tensor of shape (batch_size, sequence_length, hidden_size), optional): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder. encoder_attention_mask (tf.Tensor of shape (batch_size, sequence_length), optional): Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

past_key_values (Tuple[Tuple[tf.Tensor]] of length config.n_layers) contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of alldecoder_input_ids of shape (batch_size, sequence_length). use_cache (bool, optional, defaults to True): If set to True, past_key_values key value states are returned and can be used to speed up decoding (seepast_key_values). Set to False during training, True during generation labels (tf.Tensor or np.ndarray of shape (batch_size, sequence_length), optional): Labels for computing the cross entropy classification loss. Indices should be in [0, ..., config.vocab_size - 1].

Example:

from transformers import AutoTokenizer, TFBertLMHeadModel import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = TFBertLMHeadModel.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf") outputs = model(inputs) logits = outputs.logits

TFBertForMaskedLM

class transformers.TFBertForMaskedLM

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model with a language modeling head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFMaskedLMOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFMaskedLMOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFBertForMaskedLM import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = TFBertForMaskedLM.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="tf") logits = model(**inputs).logits

mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0]) selected_logits = tf.gather_nd(logits[0], indices=mask_token_index)

predicted_token_id = tf.math.argmax(selected_logits, axis=-1) tokenizer.decode(predicted_token_id) 'paris'

labels = tokenizer("The capital of France is Paris.", return_tensors="tf")["input_ids"]

labels = tf.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels) round(float(outputs.loss), 2) 0.88

TFBertForNextSentencePrediction

class transformers.TFBertForNextSentencePrediction

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model with a next sentence prediction (classification) head on top.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None next_sentence_label: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFNextSentencePredictorOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFNextSentencePredictorOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForNextSentencePrediction forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Examples:

import tensorflow as tf from transformers import AutoTokenizer, TFBertForNextSentencePrediction

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = TFBertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." next_sentence = "The sky is blue due to the shorter wavelength of blue light." encoding = tokenizer(prompt, next_sentence, return_tensors="tf")

logits = model(encoding["input_ids"], token_type_ids=encoding["token_type_ids"])[0] assert logits[0][0] < logits[0][1]

TFBertForSequenceClassification

class transformers.TFBertForSequenceClassification

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFSequenceClassifierOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFSequenceClassifierOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFBertForSequenceClassification import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("ydshieh/bert-base-uncased-yelp-polarity") model = TFBertForSequenceClassification.from_pretrained("ydshieh/bert-base-uncased-yelp-polarity")

inputs = tokenizer("Hello, my dog is cute", return_tensors="tf")

logits = model(**inputs).logits

predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0]) model.config.id2label[predicted_class_id] 'LABEL_1'

num_labels = len(model.config.id2label) model = TFBertForSequenceClassification.from_pretrained("ydshieh/bert-base-uncased-yelp-polarity", num_labels=num_labels)

labels = tf.constant(1) loss = model(**inputs, labels=labels).loss round(float(loss), 2) 0.01

TFBertForMultipleChoice

class transformers.TFBertForMultipleChoice

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFBertForMultipleChoice import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = TFBertForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." choice0 = "It is eaten with a fork and a knife." choice1 = "It is eaten while held in the hand."

encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="tf", padding=True) inputs = {k: tf.expand_dims(v, 0) for k, v in encoding.items()} outputs = model(inputs)

logits = outputs.logits

TFBertForTokenClassification

class transformers.TFBertForTokenClassification

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None labels: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFTokenClassifierOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFTokenClassifierOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFBertForTokenClassification import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english") model = TFBertForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")

inputs = tokenizer( ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="tf" ... )

logits = model(**inputs).logits predicted_token_class_ids = tf.math.argmax(logits, axis=-1)

predicted_tokens_classes = [model.config.id2label[t] for t in predicted_token_class_ids[0].numpy().tolist()] predicted_tokens_classes ['O', 'I-ORG', 'I-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'O', 'I-LOC', 'O', 'I-LOC', 'I-LOC']

labels = predicted_token_class_ids loss = tf.math.reduce_mean(model(**inputs, labels=labels).loss) round(float(loss), 2) 0.01

TFBertForQuestionAnswering

class transformers.TFBertForQuestionAnswering

< source >

( config: BertConfig *inputs **kwargs )

Parameters

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from TFPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a keras.Model subclass. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general usage and behavior.

TensorFlow models and layers in transformers accept two formats as input:

The reason the second format is supported is that Keras methods prefer this format when passing inputs to models and layers. Because of this support, when using methods like model.fit() things should “just work” for you - just pass your inputs and labels in any format that model.fit() supports! If, however, you want to use the second format outside of Keras methods like fit() and predict(), such as when creating your own layers or models with the Keras Functional API, there are three possibilities you can use to gather all the input Tensors in the first positional argument:

Note that when creating models and layers withsubclassing then you don’t need to worry about any of this, as you can just pass inputs like you would to any other Python function!

call

< source >

( input_ids: TFModelInputType | None = None attention_mask: np.ndarray | tf.Tensor | None = None token_type_ids: np.ndarray | tf.Tensor | None = None position_ids: np.ndarray | tf.Tensor | None = None head_mask: np.ndarray | tf.Tensor | None = None inputs_embeds: np.ndarray | tf.Tensor | None = None output_attentions: Optional[bool] = None output_hidden_states: Optional[bool] = None return_dict: Optional[bool] = None start_positions: np.ndarray | tf.Tensor | None = None end_positions: np.ndarray | tf.Tensor | None = None training: Optional[bool] = False ) → transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput or tuple(tf.Tensor)

Parameters

A transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput or a tuple of tf.Tensor (ifreturn_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The TFBertForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, TFBertForQuestionAnswering import tensorflow as tf

tokenizer = AutoTokenizer.from_pretrained("ydshieh/bert-base-cased-squad2") model = TFBertForQuestionAnswering.from_pretrained("ydshieh/bert-base-cased-squad2")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf") outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0]) answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] tokenizer.decode(predict_answer_tokens) 'a nice puppet'

target_start_index = tf.constant([14]) target_end_index = tf.constant([15])

outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) loss = tf.math.reduce_mean(outputs.loss) round(float(loss), 2) 7.41

FlaxBertModel

class transformers.FlaxBertModel

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

The bare Bert Model transformer outputting raw hidden-states without any specific head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertModel

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertModel.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax") outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

FlaxBertForPreTraining

class transformers.FlaxBertForPreTraining

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput or tuple(torch.FloatTensor)

Parameters

A transformers.models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForPreTraining

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForPreTraining.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="np") outputs = model(**inputs)

prediction_logits = outputs.prediction_logits seq_relationship_logits = outputs.seq_relationship_logits

FlaxBertForCausalLM

class transformers.FlaxBertForCausalLM

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with a language modeling head on top (a linear layer on top of the hidden-states output) e.g for autoregressive tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForCausalLM.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="np") outputs = model(**inputs)

next_token_logits = outputs.logits[:, -1]

FlaxBertForMaskedLM

class transformers.FlaxBertForMaskedLM

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with a language modeling head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxMaskedLMOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxMaskedLMOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForMaskedLM.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="jax")

outputs = model(**inputs) logits = outputs.logits

FlaxBertForNextSentencePrediction

class transformers.FlaxBertForNextSentencePrediction

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with a next sentence prediction (classification) head on top.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxNextSentencePredictorOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxNextSentencePredictorOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForNextSentencePrediction

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForNextSentencePrediction.from_pretrained("google-bert/bert-base-uncased")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." next_sentence = "The sky is blue due to the shorter wavelength of blue light." encoding = tokenizer(prompt, next_sentence, return_tensors="jax")

outputs = model(**encoding) logits = outputs.logits assert logits[0, 0] < logits[0, 1]

FlaxBertForSequenceClassification

class transformers.FlaxBertForSequenceClassification

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")

outputs = model(**inputs) logits = outputs.logits

FlaxBertForMultipleChoice

class transformers.FlaxBertForMultipleChoice

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForMultipleChoice

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." choice0 = "It is eaten with a fork and a knife." choice1 = "It is eaten while held in the hand."

encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="jax", padding=True) outputs = model(**{k: v[None, :] for k, v in encoding.items()})

logits = outputs.logits

FlaxBertForTokenClassification

class transformers.FlaxBertForTokenClassification

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxTokenClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxTokenClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForTokenClassification.from_pretrained("google-bert/bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")

outputs = model(**inputs) logits = outputs.logits

FlaxBertForQuestionAnswering

class transformers.FlaxBertForQuestionAnswering

< source >

( config: BertConfig input_shape: typing.Tuple = (1, 1) seed: int = 0 dtype: dtype = <class 'jax.numpy.float32'> _do_init: bool = True gradient_checkpointing: bool = False **kwargs )

Parameters

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading, saving and converting weights from PyTorch models)

This model is also aflax.linen.Module subclass. Use it as a regular Flax linen Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

__call__

< source >

( input_ids attention_mask = None token_type_ids = None position_ids = None head_mask = None encoder_hidden_states = None encoder_attention_mask = None params: typing.Optional[dict] = None dropout_rng: <function PRNGKey at 0x7f8a6cc836d0> = None train: bool = False output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None past_key_values: typing.Optional[dict] = None ) → transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (BertConfig) and inputs.

The FlaxBertPreTrainedModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, FlaxBertForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased") model = FlaxBertForQuestionAnswering.from_pretrained("google-bert/bert-base-uncased")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" inputs = tokenizer(question, text, return_tensors="jax")

outputs = model(**inputs) start_scores = outputs.start_logits end_scores = outputs.end_logits

Bert specific outputs

class transformers.models.bert.modeling_bert.BertForPreTrainingOutput

< source >

( loss: typing.Optional[torch.FloatTensor] = None prediction_logits: typing.Optional[torch.FloatTensor] = None seq_relationship_logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[typing.Tuple[torch.FloatTensor]] = None attentions: typing.Optional[typing.Tuple[torch.FloatTensor]] = None )

Parameters

Output type of BertForPreTraining.

class transformers.models.bert.modeling_tf_bert.TFBertForPreTrainingOutput

< source >

( loss: tf.Tensor | None = None prediction_logits: Optional[tf.Tensor] = None seq_relationship_logits: Optional[tf.Tensor] = None hidden_states: Optional[Union[Tuple[tf.Tensor], tf.Tensor]] = None attentions: Optional[Union[Tuple[tf.Tensor], tf.Tensor]] = None )

Parameters

Output type of TFBertForPreTraining.

class transformers.models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput

< source >

( prediction_logits: Array = None seq_relationship_logits: Array = None hidden_states: typing.Optional[typing.Tuple[jax.Array]] = None attentions: typing.Optional[typing.Tuple[jax.Array]] = None )

Parameters

Output type of BertForPreTraining.

“Returns a new object replacing the specified fields with new values.

< > Update on GitHub