MVP (original) (raw)

PyTorch

Overview

The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.

According to the abstract,

This model was contributed by Tianyi Tang. The detailed information and instructions can be found here.

Usage tips

Usage examples

For summarization, it is an example to use MVP and MVP with summarization-specific prompts.

from transformers import MvpTokenizer, MvpForConditionalGeneration

tokenizer = MvpTokenizer.from_pretrained("RUCAIBox/mvp") model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp") model_with_prompt = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp-summarization")

inputs = tokenizer( ... "Summarize: You may want to stick it to your boss and leave your job, but don't do it if these are your reasons.", ... return_tensors="pt", ... ) generated_ids = model.generate(**inputs) tokenizer.batch_decode(generated_ids, skip_special_tokens=True) ["Why You Shouldn't Quit Your Job"]

generated_ids = model_with_prompt.generate(**inputs) tokenizer.batch_decode(generated_ids, skip_special_tokens=True) ["Don't do it if these are your reasons"]

For data-to-text generation, it is an example to use MVP and multi-task pre-trained variants.

from transformers import MvpTokenizerFast, MvpForConditionalGeneration

tokenizer = MvpTokenizerFast.from_pretrained("RUCAIBox/mvp") model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp") model_with_mtl = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mtl-data-to-text")

inputs = tokenizer( ... "Describe the following data: Iron Man | instance of | Superhero [SEP] Stan Lee | creator | Iron Man", ... return_tensors="pt", ... ) generated_ids = model.generate(**inputs) tokenizer.batch_decode(generated_ids, skip_special_tokens=True) ['Stan Lee created the character of Iron Man, a fictional superhero appearing in American comic']

generated_ids = model_with_mtl.generate(**inputs) tokenizer.batch_decode(generated_ids, skip_special_tokens=True) ['Iron Man is a fictional superhero appearing in American comic books published by Marvel Comics.']

For lightweight tuning, i.e., fixing the model and only tuning prompts, you can load MVP with randomly initialized prompts or with task-specific prompts. Our code also supports Prefix-tuning with BART following the original paper.

from transformers import MvpForConditionalGeneration

model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp", use_prompt=True)

sum(p.numel() for p in model.parameters() if p.requires_grad) 468116832

model.set_lightweight_tuning()

sum(p.numel() for p in model.parameters() if p.requires_grad) 61823328

model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mtl-data-to-text") model.set_lightweight_tuning()

model = MvpForConditionalGeneration.from_pretrained("facebook/bart-large", use_prompt=True) model.set_lightweight_tuning()

Resources

MvpConfig

class transformers.MvpConfig

< source >

( vocab_size = 50267 max_position_embeddings = 1024 encoder_layers = 12 encoder_ffn_dim = 4096 encoder_attention_heads = 16 decoder_layers = 12 decoder_ffn_dim = 4096 decoder_attention_heads = 16 encoder_layerdrop = 0.0 decoder_layerdrop = 0.0 activation_function = 'gelu' d_model = 1024 dropout = 0.1 attention_dropout = 0.0 activation_dropout = 0.0 init_std = 0.02 classifier_dropout = 0.0 scale_embedding = False use_cache = True pad_token_id = 1 bos_token_id = 0 eos_token_id = 2 is_encoder_decoder = True decoder_start_token_id = 2 forced_eos_token_id = 2 use_prompt = False prompt_length = 100 prompt_mid_dim = 800 **kwargs )

Parameters

This is the configuration class to store the configuration of a MvpModel. It is used to instantiate a MVP model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the MVP RUCAIBox/mvparchitecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

from transformers import MvpConfig, MvpModel

configuration = MvpConfig()

model = MvpModel(configuration)

configuration = model.config

MvpTokenizer

class transformers.MvpTokenizer

< source >

( vocab_file merges_file errors = 'replace' bos_token = '' eos_token = '' sep_token = '' cls_token = '' unk_token = '' pad_token = '' mask_token = '' add_prefix_space = False **kwargs )

Parameters

Constructs a MVP tokenizer, which is smilar to the RoBERTa tokenizer, using byte-level Byte-Pair-Encoding.

This tokenizer has been trained to treat spaces like parts of the tokens (a bit like sentencepiece) so a word will

be encoded differently whether it is at the beginning of the sentence (without space) or not:

from transformers import MvpTokenizer

tokenizer = MvpTokenizer.from_pretrained("RUCAIBox/mvp") tokenizer("Hello world")["input_ids"] [0, 31414, 232, 2]

tokenizer(" Hello world")["input_ids"] [0, 20920, 232, 2]

You can get around that behavior by passing add_prefix_space=True when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance.

When used with is_split_into_words=True, this tokenizer will add a space before each word (even the first one).

This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

build_inputs_with_special_tokens

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

Parameters

List of input IDs with the appropriate special tokens.

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and adding special tokens. A MVP sequence has the following format:

Converts a sequence of tokens (string) in a single string.

create_token_type_ids_from_sequences

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

Parameters

List of zeros.

Create a mask from the two sequences passed to be used in a sequence-pair classification task. MVP does not make use of token type ids, therefore a list of zeros is returned.

get_special_tokens_mask

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None already_has_special_tokens: bool = False ) → List[int]

Parameters

A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.

Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the tokenizer prepare_for_model method.

MvpTokenizerFast

class transformers.MvpTokenizerFast

< source >

( vocab_file = None merges_file = None tokenizer_file = None errors = 'replace' bos_token = '' eos_token = '' sep_token = '' cls_token = '' unk_token = '' pad_token = '' mask_token = '' add_prefix_space = False trim_offsets = True **kwargs )

Parameters

Construct a “fast” MVP tokenizer (backed by HuggingFace’s tokenizers library), derived from the GPT-2 tokenizer, using byte-level Byte-Pair-Encoding.

This tokenizer has been trained to treat spaces like parts of the tokens (a bit like sentencepiece) so a word will

be encoded differently whether it is at the beginning of the sentence (without space) or not:

from transformers import MvpTokenizerFast

tokenizer = MvpTokenizerFast.from_pretrained("RUCAIBox/mvp") tokenizer("Hello world")["input_ids"] [0, 31414, 232, 2]

tokenizer(" Hello world")["input_ids"] [0, 20920, 232, 2]

You can get around that behavior by passing add_prefix_space=True when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance.

When used with is_split_into_words=True, this tokenizer needs to be instantiated with add_prefix_space=True.

This tokenizer inherits from PreTrainedTokenizerFast which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

create_token_type_ids_from_sequences

< source >

( token_ids_0: typing.List[int] token_ids_1: typing.Optional[typing.List[int]] = None ) → List[int]

Parameters

List of zeros.

Create a mask from the two sequences passed to be used in a sequence-pair classification task. MVP does not make use of token type ids, therefore a list of zeros is returned.

MvpModel

class transformers.MvpModel

< source >

( config: MvpConfig )

Parameters

The bare Mvp Model outputting raw hidden-states without any specific head on top.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.List[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None decoder_inputs_embeds: typing.Optional[torch.FloatTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.Seq2SeqModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MvpConfig) and inputs.

The MvpModel forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

MvpForConditionalGeneration

class transformers.MvpForConditionalGeneration

< source >

( config: MvpConfig )

Parameters

The MVP Model with a language modeling head. Can be used for various text generation tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.List[torch.FloatTensor]] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None decoder_inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqLMOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.Seq2SeqLMOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MvpConfig) and inputs.

The MvpForConditionalGeneration forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of summarization:

Fine-tuning a model

import torch from transformers import AutoTokenizer, MvpForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("RUCAIBox/mvp") model = MvpForConditionalGeneration.from_pretrained("RUCAIBox/mvp")

inputs = tokenizer( ... "Summarize: You may want to stick it to your boss and leave your job, but don't do it if these are your reasons.", ... return_tensors="pt", ... ) labels = tokenizer("Bad Reasons To Quit Your Job", return_tensors="pt")["input_ids"]

loss = model(**inputs, labels=labels).loss loss.backward()

Inference after the model fine-tuned

with torch.no_grad(): ... generated_ids = model.generate(**inputs)

generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

MvpForSequenceClassification

class transformers.MvpForSequenceClassification

< source >

( config: MvpConfig **kwargs )

Parameters

Mvp model with a sequence classification/head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None decoder_inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MvpConfig) and inputs.

The MvpForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification:

Fine-tuning a model on num_labels classes

import torch from transformers import AutoTokenizer, MvpForSequenceClassification

num_labels = 2
tokenizer = AutoTokenizer.from_pretrained("RUCAIBox/mvp") model = MvpForSequenceClassification.from_pretrained("RUCAIBox/mvp", num_labels=num_labels)

inputs = tokenizer("Classify: Hello, my dog is cute", return_tensors="pt") labels = torch.tensor(1)

loss = model(**inputs, labels=labels).loss loss.backward()

Inference after the model fine-tuned

with torch.no_grad(): ... logits = model(**inputs).logits

predicted_class_id = logits.argmax()

MvpForQuestionAnswering

class transformers.MvpForQuestionAnswering

< source >

( config )

Parameters

The Mvp transformer with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states output to compute span start logits and span end logits).

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< source >

( input_ids: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.LongTensor] = None head_mask: typing.Optional[torch.Tensor] = None decoder_head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None encoder_outputs: typing.Optional[typing.List[torch.FloatTensor]] = None start_positions: typing.Optional[torch.LongTensor] = None end_positions: typing.Optional[torch.LongTensor] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None decoder_inputs_embeds: typing.Optional[torch.FloatTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MvpConfig) and inputs.

The MvpForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

Fine-tuning a model for extrative question answering, and our model also supports generative question answering using BartForConditionalGeneration

import torch from transformers import AutoTokenizer, MvpForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("RUCAIBox/mvp") model = MvpForQuestionAnswering.from_pretrained("RUCAIBox/mvp")

inputs = tokenizer( ... "Answer the following question: Who was Jim Henson? [SEP] Jim Henson was a nice puppet", ... return_tensors="pt", ... ) target_start_index = torch.tensor([18]) target_end_index = torch.tensor([19])

loss = model(**inputs, start_positions=target_start_index, end_positions=target_end_index).loss loss.backward()

Inference after the model fine-tuned

with torch.no_grad(): ... outputs = model(**inputs)

answer_start_index = outputs.start_logits.argmax() answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] predict_answer = tokenizer.decode(predict_answer_tokens)

MvpForCausalLM

class transformers.MvpForCausalLM

< source >

( config )

forward

< source >

( input_ids: typing.Optional[torch.LongTensor] = None attention_mask: typing.Optional[torch.Tensor] = None encoder_hidden_states: typing.Optional[torch.FloatTensor] = None encoder_attention_mask: typing.Optional[torch.FloatTensor] = None head_mask: typing.Optional[torch.Tensor] = None cross_attn_head_mask: typing.Optional[torch.Tensor] = None past_key_values: typing.Optional[typing.List[torch.FloatTensor]] = None inputs_embeds: typing.Optional[torch.FloatTensor] = None labels: typing.Optional[torch.LongTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or tuple(torch.FloatTensor)

Parameters

A transformers.modeling_outputs.CausalLMOutputWithCrossAttentions or a tuple oftorch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (MvpConfig) and inputs.

The MvpForCausalLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example:

from transformers import AutoTokenizer, MvpForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RUCAIBox/mvp") model = MvpForCausalLM.from_pretrained("RUCAIBox/mvp", add_cross_attention=False)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") outputs = model(**inputs)

logits = outputs.logits list(logits.shape) [1, 8, 50267]

< > Update on GitHub