Perform chat completion inference

| Elasticsearch API documentation (original) (raw)

Dismiss highlight Show more

Query parameters

Specifies the amount of time to wait for the inference request to complete.

application/json

Body Required

A list of objects representing the conversation. Requests should generally only add new messages from the user (role user). The other message roles (assistant, system, or tool) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.
Hide messages attributes Show messages attributes object
- content string | array[object]
Hide attributes Show attributes object
* The text content.
* The type of content.
- The role of the message author.
- The tool calls generated by the model.
  Hide tool_calls attributes Show tool_calls attributes object
  * Hide function attributes Show function attributes object
  * The arguments to call the function with in JSON format.
  * The name of the function to call.
  * The type of the tool call.
The ID of the model to use.
The upper bound limit for the number of tokens that can be generated for a completion request.
A sequence of strings to control when the model should stop generating additional tokens.
The sampling temperature to use.
A list of tools that the model can call.
Hide tools attributes Show tools attributes object
- The type of tool.
- Hide function attributes Show function attributes object
  * A description of what the function does. This is used by the model to choose when and how to call the function.
  * The name of the function.
  * The parameters the functional accepts. This should be formatted as a JSON object.
  * Whether to enable schema adherence when generating the function call.
Nucleus sampling, an alternative to sampling with temperature.

Perform chat completion inference

Query parameters

Body Required

content string | array[object]