utils/generation (original) (raw)

Classes, functions, and utilities for generation.

Todo

Describe how to create a custom GenerationConfig.
utils/generation
- static
  * .LogitsProcessorList ⇐ Callable
  * new LogitsProcessorList()
  * .push(item)
  * .extend(items)
  * ._call(input_ids, batchedLogits)
  * .LogitsProcessor ⇐ Callable
  * ._call(input_ids, logits)
  * .ForceTokensLogitsProcessor ⇐ LogitsProcessor
  * new ForceTokensLogitsProcessor(forced_decoder_ids)
  * ._call(input_ids, logits) ⇒ Tensor
  * .ForcedBOSTokenLogitsProcessor ⇐ LogitsProcessor
  * new ForcedBOSTokenLogitsProcessor(bos_token_id)
  * ._call(input_ids, logits) ⇒ Object
  * .ForcedEOSTokenLogitsProcessor ⇐ LogitsProcessor
  * new ForcedEOSTokenLogitsProcessor(max_length, forced_eos_token_id)
  * ._call(input_ids, logits)
  * .SuppressTokensAtBeginLogitsProcessor ⇐ LogitsProcessor
  * new SuppressTokensAtBeginLogitsProcessor(begin_suppress_tokens, begin_index)
  * ._call(input_ids, logits) ⇒ Object
  * .WhisperTimeStampLogitsProcessor ⇐ LogitsProcessor
  * new WhisperTimeStampLogitsProcessor(generate_config)
  * ._call(input_ids, logits) ⇒ Tensor
  * .NoRepeatNGramLogitsProcessor ⇐ LogitsProcessor
  * new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)
  * .getNgrams(prevInputIds) ⇒ Map.<string, Array<number>>
  * .getGeneratedNgrams(bannedNgrams, prevInputIds) ⇒ Array.<number>
  * .calcBannedNgramTokens(prevInputIds) ⇒ Array.<number>
  * ._call(input_ids, logits) ⇒ Object
  * .RepetitionPenaltyLogitsProcessor ⇐ LogitsProcessor
  * new RepetitionPenaltyLogitsProcessor(penalty)
  * ._call(input_ids, logits) ⇒ Object
  * .MinLengthLogitsProcessor ⇐ LogitsProcessor
  * new MinLengthLogitsProcessor(min_length, eos_token_id)
  * ._call(input_ids, logits) ⇒ Object
  * .MinNewTokensLengthLogitsProcessor ⇐ LogitsProcessor
  * new MinNewTokensLengthLogitsProcessor(prompt_length_to_skip, min_new_tokens, eos_token_id)
  * ._call(input_ids, logits) ⇒ Object
  * .NoBadWordsLogitsProcessor
  * new NoBadWordsLogitsProcessor(bad_words_ids, eos_token_id)
  * ._call(input_ids, logits) ⇒ Object
  * .Sampler
  * new Sampler(generation_config)
  * instance
  * ._call(logits, index) ⇒ void
  * .sample(logits, index)
  * .getLogits(logits, index) ⇒ Float32Array
  * .randomSelect(probabilities) ⇒ number
  * static
  * .getSampler(generation_config) ⇒ Sampler
  * .GenerationConfig : *
- inner
  * ~GenerationConfig
  * new GenerationConfig(kwargs)
  * ~GreedySampler ⇐ Sampler
  * .sample(logits, [index]) ⇒ Array
  * ~MultinomialSampler ⇐ Sampler
  * .sample(logits, index) ⇒ Array
  * ~BeamSearchSampler ⇐ Sampler
  * .sample(logits, index) ⇒ Array
  * ~GenerationConfigType : Object

utils/generation.LogitsProcessorList ⇐ `Callable`

A class representing a list of logits processors. A logits processor is a function that modifies the logits output of a language model. This class provides methods for adding new processors and applying all processors to a batch of logits.

Kind: static class of utils/generation
Extends: Callable

.LogitsProcessorList ⇐ Callable

new LogitsProcessorList()

Constructs a new instance of LogitsProcessorList.

logitsProcessorList.push(item)

Adds a new logits processor to the list.

Kind: instance method of LogitsProcessorList

Param	Type	Description
item	LogitsProcessor	The logits processor function to add.

logitsProcessorList.extend(items)

Adds multiple logits processors to the list.

Kind: instance method of LogitsProcessorList

Param	Type	Description
items	Array.	The logits processor functions to add.

logitsProcessorList._call(input_ids, batchedLogits)

Applies all logits processors in the list to a batch of logits, modifying them in-place.

Kind: instance method of LogitsProcessorList

Param	Type	Description
input_ids	Array.	The input IDs for the language model.
batchedLogits	Array.<Array>	A 2D array of logits, where each row corresponds to a single input sequence in the batch.

utils/generation.LogitsProcessor ⇐ `Callable`

Base class for processing logits.

Kind: static class of utils/generation
Extends: Callable

logitsProcessor._call(input_ids, logits)

Apply the processor to the input logits.

Kind: instance abstract method of LogitsProcessor
Throws:

Error Throws an error if `_call` is not implemented in the subclass.

Param	Type	Description
input_ids	Array	The input ids.
logits	Tensor	The logits to process.

utils/generation.ForceTokensLogitsProcessor ⇐ `LogitsProcessor`

A logits processor that forces a specific token to be generated by the decoder.

Kind: static class of utils/generation
Extends: LogitsProcessor

.ForceTokensLogitsProcessor ⇐ LogitsProcessor
- new ForceTokensLogitsProcessor(forced_decoder_ids)
- ._call(input_ids, logits) ⇒ Tensor

new ForceTokensLogitsProcessor(forced_decoder_ids)

Constructs a new instance of ForceTokensLogitsProcessor.

Param	Type	Description
forced_decoder_ids	Array	The ids of tokens that should be forced.

forceTokensLogitsProcessor._call(input_ids, logits) ⇒ `Tensor`

Apply the processor to the input logits.

Kind: instance method of ForceTokensLogitsProcessor
Returns: Tensor - The processed logits.

Param	Type	Description
input_ids	Array	The input ids.
logits	Tensor	The logits to process.

utils/generation.ForcedBOSTokenLogitsProcessor ⇐ `LogitsProcessor`

A LogitsProcessor that forces a BOS token at the beginning of the generated sequence.

Kind: static class of utils/generation
Extends: LogitsProcessor

.ForcedBOSTokenLogitsProcessor ⇐ LogitsProcessor
- new ForcedBOSTokenLogitsProcessor(bos_token_id)
- ._call(input_ids, logits) ⇒ Object

new ForcedBOSTokenLogitsProcessor(bos_token_id)

Create a ForcedBOSTokenLogitsProcessor.

Param	Type	Description
bos_token_id	number	The ID of the beginning-of-sequence token to be forced.

forcedBOSTokenLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply the BOS token forcing to the logits.

Kind: instance method of ForcedBOSTokenLogitsProcessor
Returns: Object - The logits with BOS token forcing.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.ForcedEOSTokenLogitsProcessor ⇐ `LogitsProcessor`

A logits processor that forces end-of-sequence token probability to 1.

Kind: static class of utils/generation
Extends: LogitsProcessor

.ForcedEOSTokenLogitsProcessor ⇐ LogitsProcessor
- new ForcedEOSTokenLogitsProcessor(max_length, forced_eos_token_id)
- ._call(input_ids, logits)

new ForcedEOSTokenLogitsProcessor(max_length, forced_eos_token_id)

Create a ForcedEOSTokenLogitsProcessor.

Param	Type	Description
max_length	number	Max length of the sequence.
forced_eos_token_id	number \| Array	The ID of the end-of-sequence token to be forced.

forcedEOSTokenLogitsProcessor._call(input_ids, logits)

Apply the processor to input_ids and logits.

Kind: instance method of ForcedEOSTokenLogitsProcessor

Param	Type	Description
input_ids	Array.	The input ids.
logits	Tensor	The logits tensor.

utils/generation.SuppressTokensAtBeginLogitsProcessor ⇐ `LogitsProcessor`

A LogitsProcessor that suppresses a list of tokens as soon as the generate function starts generating using begin_index tokens. This should ensure that the tokens defined bybegin_suppress_tokens at not sampled at the begining of the generation.

Kind: static class of utils/generation
Extends: LogitsProcessor

.SuppressTokensAtBeginLogitsProcessor ⇐ LogitsProcessor
- new SuppressTokensAtBeginLogitsProcessor(begin_suppress_tokens, begin_index)
- ._call(input_ids, logits) ⇒ Object

new SuppressTokensAtBeginLogitsProcessor(begin_suppress_tokens, begin_index)

Create a SuppressTokensAtBeginLogitsProcessor.

Param	Type	Description
begin_suppress_tokens	Array.	The IDs of the tokens to suppress.
begin_index	number	The number of tokens to generate before suppressing tokens.

suppressTokensAtBeginLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply the BOS token forcing to the logits.

Kind: instance method of SuppressTokensAtBeginLogitsProcessor
Returns: Object - The logits with BOS token forcing.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.WhisperTimeStampLogitsProcessor ⇐ `LogitsProcessor`

A LogitsProcessor that handles adding timestamps to generated text.

Kind: static class of utils/generation
Extends: LogitsProcessor

.WhisperTimeStampLogitsProcessor ⇐ LogitsProcessor
- new WhisperTimeStampLogitsProcessor(generate_config)
- ._call(input_ids, logits) ⇒ Tensor

new WhisperTimeStampLogitsProcessor(generate_config)

Constructs a new WhisperTimeStampLogitsProcessor.

Param	Type	Description
generate_config	Object	The config object passed to the generate() method of a transformer model.
generate_config.eos_token_id	number	The ID of the end-of-sequence token.
generate_config.no_timestamps_token_id	number	The ID of the token used to indicate that a token should not have a timestamp.
[generate_config.forced_decoder_ids]	Array.<Array>	An array of two-element arrays representing decoder IDs that are forced to appear in the output. The second element of each array indicates whether the token is a timestamp.
[generate_config.max_initial_timestamp_index]	number	The maximum index at which an initial timestamp can appear.

whisperTimeStampLogitsProcessor._call(input_ids, logits) ⇒ `Tensor`

Modify the logits to handle timestamp tokens.

Kind: instance method of WhisperTimeStampLogitsProcessor
Returns: Tensor - The modified logits.

Param	Type	Description
input_ids	Array	The input sequence of tokens.
logits	Tensor	The logits output by the model.

utils/generation.NoRepeatNGramLogitsProcessor ⇐ `LogitsProcessor`

A logits processor that disallows ngrams of a certain size to be repeated.

Kind: static class of utils/generation
Extends: LogitsProcessor

.NoRepeatNGramLogitsProcessor ⇐ LogitsProcessor
- new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)
- .getNgrams(prevInputIds) ⇒ Map.<string, Array<number>>
- .getGeneratedNgrams(bannedNgrams, prevInputIds) ⇒ Array.<number>
- .calcBannedNgramTokens(prevInputIds) ⇒ Array.<number>
- ._call(input_ids, logits) ⇒ Object

new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)

Create a NoRepeatNGramLogitsProcessor.

Param	Type	Description
no_repeat_ngram_size	number	The no-repeat-ngram size. All ngrams of this size can only occur once.

noRepeatNGramLogitsProcessor.getNgrams(prevInputIds) ⇒ `Map. < string, Array < number > >`

Generate n-grams from a sequence of token ids.

Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Map.<string, Array<number>> - Map of generated n-grams

Param	Type	Description
prevInputIds	Array.	List of previous input ids

noRepeatNGramLogitsProcessor.getGeneratedNgrams(bannedNgrams, prevInputIds) ⇒ `Array. < number >`

Generate n-grams from a sequence of token ids.

Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Array.<number> - Map of generated n-grams

Param	Type	Description
bannedNgrams	Map.<string, Array>	Map of banned n-grams
prevInputIds	Array.	List of previous input ids

noRepeatNGramLogitsProcessor.calcBannedNgramTokens(prevInputIds) ⇒ `Array. < number >`

Calculate banned n-gram tokens

Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Array.<number> - Map of generated n-grams

Param	Type	Description
prevInputIds	Array.	List of previous input ids

noRepeatNGramLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply the no-repeat-ngram processor to the logits.

Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Object - The logits with no-repeat-ngram processing.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.RepetitionPenaltyLogitsProcessor ⇐ `LogitsProcessor`

A logits processor that penalises repeated output tokens.

Kind: static class of utils/generation
Extends: LogitsProcessor

.RepetitionPenaltyLogitsProcessor ⇐ LogitsProcessor
- new RepetitionPenaltyLogitsProcessor(penalty)
- ._call(input_ids, logits) ⇒ Object

new RepetitionPenaltyLogitsProcessor(penalty)

Create a RepetitionPenaltyLogitsProcessor.

Param	Type	Description
penalty	number	The penalty to apply for repeated tokens.

repetitionPenaltyLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply the repetition penalty to the logits.

Kind: instance method of RepetitionPenaltyLogitsProcessor
Returns: Object - The logits with repetition penalty processing.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.MinLengthLogitsProcessor ⇐ `LogitsProcessor`

A logits processor that enforces a minimum number of tokens.

Kind: static class of utils/generation
Extends: LogitsProcessor

.MinLengthLogitsProcessor ⇐ LogitsProcessor
- new MinLengthLogitsProcessor(min_length, eos_token_id)
- ._call(input_ids, logits) ⇒ Object

new MinLengthLogitsProcessor(min_length, eos_token_id)

Create a MinLengthLogitsProcessor.

Param	Type	Description
min_length	number	The minimum length below which the score of eos_token_id is set to negative infinity.
eos_token_id	number \| Array	The ID/IDs of the end-of-sequence token.

minLengthLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply logit processor.

Kind: instance method of MinLengthLogitsProcessor
Returns: Object - The processed logits.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.MinNewTokensLengthLogitsProcessor ⇐ `LogitsProcessor`

A logits processor that enforces a minimum number of new tokens.

Kind: static class of utils/generation
Extends: LogitsProcessor

.MinNewTokensLengthLogitsProcessor ⇐ LogitsProcessor
- new MinNewTokensLengthLogitsProcessor(prompt_length_to_skip, min_new_tokens, eos_token_id)
- ._call(input_ids, logits) ⇒ Object

new MinNewTokensLengthLogitsProcessor(prompt_length_to_skip, min_new_tokens, eos_token_id)

Create a MinNewTokensLengthLogitsProcessor.

Param	Type	Description
prompt_length_to_skip	number	The input tokens length.
min_new_tokens	number	The minimum new tokens length below which the score of eos_token_id is set to negative infinity.
eos_token_id	number \| Array	The ID/IDs of the end-of-sequence token.

minNewTokensLengthLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply logit processor.

Kind: instance method of MinNewTokensLengthLogitsProcessor
Returns: Object - The processed logits.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.NoBadWordsLogitsProcessor

Kind: static class of utils/generation

.NoBadWordsLogitsProcessor
- new NoBadWordsLogitsProcessor(bad_words_ids, eos_token_id)
- ._call(input_ids, logits) ⇒ Object

new NoBadWordsLogitsProcessor(bad_words_ids, eos_token_id)

Create a NoBadWordsLogitsProcessor.

Param	Type	Description
bad_words_ids	Array.<Array>	List of list of token ids that are not allowed to be generated.
eos_token_id	number \| Array	The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.

noBadWordsLogitsProcessor._call(input_ids, logits) ⇒ `Object`

Apply logit processor.

Kind: instance method of NoBadWordsLogitsProcessor
Returns: Object - The processed logits.

Param	Type	Description
input_ids	Array	The input IDs.
logits	Object	The logits.

utils/generation.Sampler

Sampler is a base class for all sampling methods used for text generation.

Kind: static class of utils/generation

.Sampler
- new Sampler(generation_config)
- instance
  * ._call(logits, index) ⇒ void
  * .sample(logits, index)
  * .getLogits(logits, index) ⇒ Float32Array
  * .randomSelect(probabilities) ⇒ number
- static
  * .getSampler(generation_config) ⇒ Sampler

new Sampler(generation_config)

Creates a new Sampler object with the specified generation config.

Param	Type	Description
generation_config	GenerationConfigType	The generation config.

sampler._call(logits, index) ⇒ `void`

Executes the sampler, using the specified logits.

Kind: instance method of Sampler

Param	Type
logits	Tensor
index	number

sampler.sample(logits, index)

Abstract method for sampling the logits.

Kind: instance method of Sampler
Throws:

Error

Param	Type
logits	Tensor
index	number

sampler.getLogits(logits, index) ⇒ `Float32Array`

Returns the specified logits as an array, with temperature applied.

Kind: instance method of Sampler

Param	Type
logits	Tensor
index	number

sampler.randomSelect(probabilities) ⇒ `number`

Selects an item randomly based on the specified probabilities.

Kind: instance method of Sampler
Returns: number - The index of the selected item.

Param	Type	Description
probabilities	Array	An array of probabilities to use for selection.

Sampler.getSampler(generation_config) ⇒ `Sampler`

Returns a Sampler object based on the specified options.

Kind: static method of Sampler
Returns: Sampler - A Sampler object.

Param	Type	Description
generation_config	GenerationConfigType	An object containing options for the sampler.

utils/generation.GenerationConfig : `*`

Class that holds a configuration for a generation task.

Kind: static constant of utils/generation

utils/generation~GenerationConfig

Kind: inner class of utils/generation

new GenerationConfig(kwargs)

Create a new GenerationConfig object.

Param	Type
kwargs	GenerationConfigType

utils/generation~GreedySampler ⇐ `Sampler`

Class representing a Greedy Sampler.

Kind: inner class of utils/generation
Extends: Sampler

greedySampler.sample(logits, [index]) ⇒ `Array`

Sample the maximum probability of a given logits tensor.

Kind: instance method of GreedySampler
Returns: Array - An array with a single tuple, containing the index of the maximum value and a meaningless score (since this is a greedy search).

Param	Type	Default
logits	Tensor
[index]	number	-1

utils/generation~MultinomialSampler ⇐ `Sampler`

Class representing a MultinomialSampler.

Kind: inner class of utils/generation
Extends: Sampler

multinomialSampler.sample(logits, index) ⇒ `Array`

Sample from the logits.

Kind: instance method of MultinomialSampler

Param	Type
logits	Tensor
index	number

utils/generation~BeamSearchSampler ⇐ `Sampler`

Class representing a BeamSearchSampler.

Kind: inner class of utils/generation
Extends: Sampler

beamSearchSampler.sample(logits, index) ⇒ `Array`

Sample from the logits.

Kind: instance method of BeamSearchSampler

Param	Type
logits	Tensor
index	number

utils/generation~GenerationConfigType : `Object`

The default configuration parameters.

Kind: inner typedef of utils/generation
Properties

Name	Type	Default	Description
[max_length]	number	20	The maximum length the generated tokens can have. Corresponds to the length of the input prompt + max_new_tokens. Its effect is overridden by max_new_tokens, if also set.
[max_new_tokens]	number		The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
[min_length]	number	0	The minimum length of the sequence to be generated. Corresponds to the length of the input prompt + min_new_tokens. Its effect is overridden by min_new_tokens, if also set.
[min_new_tokens]	number		The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt.
[early_stopping]	boolean \| "never"	false	Controls the stopping condition for beam-based methods, like beam-search. It accepts the following values: true, where the generation stops as soon as there are num_beams complete candidates; false, where an heuristic is applied and the generation stops when is it very unlikely to find better candidates; "never", where the beam search procedure only stops when there cannot be better candidates (canonical beam search algorithm).
[max_time]	number		The maximum amount of time you allow the computation to run for in seconds. Generation will still finish the current pass after allocated time has been passed.
[do_sample]	boolean	false	Whether or not to use sampling; use greedy decoding otherwise.
[num_beams]	number	1	Number of beams for beam search. 1 means no beam search.
[num_beam_groups]	number	1	Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See this paper for more details.
[penalty_alpha]	number		The values balance the model confidence and the degeneration penalty in contrastive search decoding.
[use_cache]	boolean	true	Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding.
[temperature]	number	1.0	The value used to modulate the next token probabilities.
[top_k]	number	50	The number of highest probability vocabulary tokens to keep for top-k-filtering.
[top_p]	number	1.0	If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
[typical_p]	number	1.0	Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. See this paper for more details.
[epsilon_cutoff]	number	0.0	If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details.
[eta_cutoff]	number	0.0	Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See Truncation Sampling as Language Model Desmoothing for more details.
[diversity_penalty]	number	0.0	This value is subtracted from a beam's score if it generates a token same as any beam from other group at a particular time. Note that diversity_penalty is only effective if group beam search is enabled.
[repetition_penalty]	number	1.0	The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
[encoder_repetition_penalty]	number	1.0	The paramater for encoder_repetition_penalty. An exponential penalty on sequences that are not in the original input. 1.0 means no penalty.
[length_penalty]	number	1.0	Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences.
[no_repeat_ngram_size]	number	0	If set to int > 0, all ngrams of that size can only occur once.
[bad_words_ids]	Array.<Array>		List of token ids that are not allowed to be generated. In order to get the token ids of the words that should not appear in the generated text, use (await tokenizer(bad_words, {add_prefix_space: true, add_special_tokens: false})).input_ids.
[force_words_ids]	Array<Array> \| Array<Array<Array>>		List of token ids that must be generated. If given a number[][], this is treated as a simple list of words that must be included, the opposite to bad_words_ids. If given number[][][], this triggers a disjunctive constraint, where one can allow different forms of each word.
[renormalize_logits]	boolean	false	Whether to renormalize the logits after applying all the logits processors or warpers (including the custom ones). It's highly recommended to set this flag to true as the search algorithms suppose the score logits are normalized but some logit processors or warpers break the normalization.
[constraints]	Array.		Custom constraints that can be added to the generation to ensure that the output will contain the use of certain tokens as defined by Constraint objects, in the most sensible way possible.
[forced_bos_token_id]	number		The id of the token to force as the first generated token after the decoder_start_token_id. Useful for multilingual models like mBART where the first generated token needs to be the target language token.
[forced_eos_token_id]	number \| Array		The id of the token to force as the last generated token when max_length is reached. Optionally, use a list to set multiple end-of-sequence tokens.
[remove_invalid_values]	boolean	false	Whether to remove possible nan and inf outputs of the model to prevent the generation method to crash. Note that using remove_invalid_values can slow down generation.
[exponential_decay_length_penalty]	Array.		This Tuple adds an exponentially increasing length penalty, after a certain amount of tokens have been generated. The tuple shall consist of: (start_index, decay_factor) where start_index indicates where penalty starts and decay_factor represents the factor of exponential decay.
[suppress_tokens]	Array.		A list of tokens that will be suppressed at generation. The SupressTokens logit processor will set their log probs to -inf so that they are not sampled.
[begin_suppress_tokens]	Array.		A list of tokens that will be suppressed at the beginning of the generation. The SupressBeginTokens logit processor will set their log probs to -inf so that they are not sampled.
[forced_decoder_ids]	Array.<Array>		A list of pairs of integers which indicates a mapping from generation indices to token indices that will be forced before sampling. For example, [[1, 123]] means the second generated token will always be a token of index 123.
[num_return_sequences]	number	1	The number of independently computed returned sequences for each element in the batch.
[output_attentions]	boolean	false	Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.
[output_hidden_states]	boolean	false	Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.
[output_scores]	boolean	false	Whether or not to return the prediction scores. See scores under returned tensors for more details.
[return_dict_in_generate]	boolean	false	Whether or not to return a ModelOutput instead of a plain tuple.
[pad_token_id]	number		The id of the padding token.
[bos_token_id]	number		The id of the beginning-of-sequence token.
[eos_token_id]	number \| Array		The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.
[encoder_no_repeat_ngram_size]	number	0	If set to int > 0, all ngrams of that size that occur in the encoder_input_ids cannot occur in the decoder_input_ids.
[decoder_start_token_id]	number		If an encoder-decoder model starts decoding with a different token than bos, the id of that token.
[generation_kwargs]	Object	{}	Additional generation kwargs will be forwarded to the generate function of the model. Kwargs that are not present in generate's signature will be used in the model forward pass.

< > Update on GitHub

utils/generation (original) (raw)

utils/generation.LogitsProcessorList ⇐ Callable

new LogitsProcessorList()

logitsProcessorList.push(item)

logitsProcessorList.extend(items)

logitsProcessorList._call(input_ids, batchedLogits)

utils/generation.LogitsProcessor ⇐ Callable

logitsProcessor._call(input_ids, logits)

utils/generation.ForceTokensLogitsProcessor ⇐ LogitsProcessor

new ForceTokensLogitsProcessor(forced_decoder_ids)

forceTokensLogitsProcessor._call(input_ids, logits) ⇒ Tensor

utils/generation.ForcedBOSTokenLogitsProcessor ⇐ LogitsProcessor

new ForcedBOSTokenLogitsProcessor(bos_token_id)

forcedBOSTokenLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.ForcedEOSTokenLogitsProcessor ⇐ LogitsProcessor

new ForcedEOSTokenLogitsProcessor(max_length, forced_eos_token_id)

forcedEOSTokenLogitsProcessor._call(input_ids, logits)

utils/generation.SuppressTokensAtBeginLogitsProcessor ⇐ LogitsProcessor

new SuppressTokensAtBeginLogitsProcessor(begin_suppress_tokens, begin_index)

suppressTokensAtBeginLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.WhisperTimeStampLogitsProcessor ⇐ LogitsProcessor

new WhisperTimeStampLogitsProcessor(generate_config)

whisperTimeStampLogitsProcessor._call(input_ids, logits) ⇒ Tensor

utils/generation.NoRepeatNGramLogitsProcessor ⇐ LogitsProcessor

new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)

noRepeatNGramLogitsProcessor.getNgrams(prevInputIds) ⇒ Map. < string, Array < number > >

noRepeatNGramLogitsProcessor.getGeneratedNgrams(bannedNgrams, prevInputIds) ⇒ Array. < number >

noRepeatNGramLogitsProcessor.calcBannedNgramTokens(prevInputIds) ⇒ Array. < number >

noRepeatNGramLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.RepetitionPenaltyLogitsProcessor ⇐ LogitsProcessor

new RepetitionPenaltyLogitsProcessor(penalty)

repetitionPenaltyLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.MinLengthLogitsProcessor ⇐ LogitsProcessor

new MinLengthLogitsProcessor(min_length, eos_token_id)

minLengthLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.MinNewTokensLengthLogitsProcessor ⇐ LogitsProcessor

new MinNewTokensLengthLogitsProcessor(prompt_length_to_skip, min_new_tokens, eos_token_id)

minNewTokensLengthLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.NoBadWordsLogitsProcessor

new NoBadWordsLogitsProcessor(bad_words_ids, eos_token_id)

noBadWordsLogitsProcessor._call(input_ids, logits) ⇒ Object

utils/generation.Sampler

new Sampler(generation_config)

sampler._call(logits, index) ⇒ void

sampler.sample(logits, index)

sampler.getLogits(logits, index) ⇒ Float32Array

sampler.randomSelect(probabilities) ⇒ number

Sampler.getSampler(generation_config) ⇒ Sampler

utils/generation.GenerationConfig : *

utils/generation~GenerationConfig

new GenerationConfig(kwargs)

utils/generation~GreedySampler ⇐ Sampler

greedySampler.sample(logits, [index]) ⇒ Array

utils/generation~MultinomialSampler ⇐ Sampler

multinomialSampler.sample(logits, index) ⇒ Array

utils/generation~BeamSearchSampler ⇐ Sampler

beamSearchSampler.sample(logits, index) ⇒ Array

utils/generation~GenerationConfigType : Object

utils/generation.LogitsProcessorList ⇐ `Callable`

utils/generation.LogitsProcessor ⇐ `Callable`

utils/generation.ForceTokensLogitsProcessor ⇐ `LogitsProcessor`

forceTokensLogitsProcessor._call(input_ids, logits) ⇒ `Tensor`

utils/generation.ForcedBOSTokenLogitsProcessor ⇐ `LogitsProcessor`

forcedBOSTokenLogitsProcessor._call(input_ids, logits) ⇒ `Object`

utils/generation.ForcedEOSTokenLogitsProcessor ⇐ `LogitsProcessor`

utils/generation.SuppressTokensAtBeginLogitsProcessor ⇐ `LogitsProcessor`

suppressTokensAtBeginLogitsProcessor._call(input_ids, logits) ⇒ `Object`

utils/generation.WhisperTimeStampLogitsProcessor ⇐ `LogitsProcessor`

whisperTimeStampLogitsProcessor._call(input_ids, logits) ⇒ `Tensor`

utils/generation.NoRepeatNGramLogitsProcessor ⇐ `LogitsProcessor`

noRepeatNGramLogitsProcessor.getNgrams(prevInputIds) ⇒ `Map. < string, Array < number > >`

noRepeatNGramLogitsProcessor.getGeneratedNgrams(bannedNgrams, prevInputIds) ⇒ `Array. < number >`

noRepeatNGramLogitsProcessor.calcBannedNgramTokens(prevInputIds) ⇒ `Array. < number >`

noRepeatNGramLogitsProcessor._call(input_ids, logits) ⇒ `Object`

utils/generation.RepetitionPenaltyLogitsProcessor ⇐ `LogitsProcessor`

repetitionPenaltyLogitsProcessor._call(input_ids, logits) ⇒ `Object`

utils/generation.MinLengthLogitsProcessor ⇐ `LogitsProcessor`

minLengthLogitsProcessor._call(input_ids, logits) ⇒ `Object`

utils/generation.MinNewTokensLengthLogitsProcessor ⇐ `LogitsProcessor`

minNewTokensLengthLogitsProcessor._call(input_ids, logits) ⇒ `Object`

noBadWordsLogitsProcessor._call(input_ids, logits) ⇒ `Object`

sampler._call(logits, index) ⇒ `void`

sampler.getLogits(logits, index) ⇒ `Float32Array`

sampler.randomSelect(probabilities) ⇒ `number`

Sampler.getSampler(generation_config) ⇒ `Sampler`

utils/generation.GenerationConfig : `*`

utils/generation~GreedySampler ⇐ `Sampler`

greedySampler.sample(logits, [index]) ⇒ `Array`

utils/generation~MultinomialSampler ⇐ `Sampler`

multinomialSampler.sample(logits, index) ⇒ `Array`

utils/generation~BeamSearchSampler ⇐ `Sampler`

beamSearchSampler.sample(logits, index) ⇒ `Array`

utils/generation~GenerationConfigType : `Object`