Influence response generation with inference parameters (original) (raw)

When running model inference, you can adjust inference parameters to influence the model response. Inference parameters can change the pool of possible outputs that the model considers during generation, or they can limit the final response.

Inference parameter default values and ranges depend on the model. To learn about inference parameters for different models, see Inference request parameters and response fields for foundation models.

The following categories of parameters are commonly found across different models:

Randomness and diversity

For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response. You can control these factors by limiting or adjusting the distribution. Foundation models typically support the following parameters to control randomness and diversity in the response.

The following table summarizes the effects of these parameters.

Parameter Effect of lower value Effect of higher value
Temperature Increase likelihood of higher-probability tokensDecrease likelihood of lower-probability tokens Increase likelihood of lower-probability tokensDecrease likelihood of higher-probability tokens
Top K Remove lower-probability tokens Allow lower-probability tokens
Top P Remove lower-probability tokens Allow lower-probability tokens

As an example to understand these parameters, consider the example prompt I hear the hoof beats of ". Let's say that the model determines the following three words to be candidates for the next token. The model also assigns a probability for each word.

{
    "horses": 0.7,
    "zebras": 0.2,
    "unicorns": 0.1
}

Length

Foundation models typically support parameters that limit the length of the response. Examples of these parameters are provided below.