Guidance (original) (raw)

What is Guidance?

Guidance is a feature that allows users to constrain the generation of a large language model with a specified grammar. This feature is particularly useful when you want to generate text that follows a specific structure or uses a specific set of words or produce output in a specific format. A prominent example is JSON grammar, where the model is forced to output valid JSON.

How is it used?

Guidance can be implemented in many ways and the community is always finding new ways to use it. Here are some examples of how you can use guidance:

Technically, guidance can be used to generate:

However these use cases can span a wide range of applications, such as:

How it works?

Diving into the details, guidance is enabled by including a grammar with a generation request that is compiled, and used to modify the chosen tokens.

This process can be broken down into the following steps:

  1. A request is sent to the backend, it is processed and placed in batch. Processing includes compiling the grammar into a finite state machine and a grammar state.

  1. The model does a forward pass over the batch. This returns probabilities for each token in the vocabulary for each request in the batch.
  2. The process of choosing one of those tokens is called sampling. The model samples from the distribution of probabilities to choose the next token. In TGI all of the steps before sampling are called processor. Grammars are applied as a processor that masks out tokens that are not allowed by the grammar.

  1. The grammar mask is applied and the model samples from the remaining tokens. Once a token is chosen, we update the grammar state with the new token, to prepare it for the next pass.

How to use Guidance?

There are two main ways to use guidance; you can either use the /generate endpoint with a grammar or use the /chat/completion endpoint with tools.

Under the hood tools are a special case of grammars that allows the model to choose one or none of the provided tools.

Please refer to using guidance for more examples and details on how to use guidance in Python, JavaScript, and cURL.

Getting the most out of guidance

Depending on how you are using guidance, you may want to make use of different features. Here are some tips to get the most out of guidance:

< > Update on GitHub