Perform chat completion inference
| Elasticsearch API documentation (original) (raw)
Dismiss highlight Show more
Query parameters
- Specifies the amount of time to wait for the inference request to complete.
application/json
Body Required
- A list of objects representing the conversation. Requests should generally only add new messages from the user (role
user
). The other message roles (assistant
,system
, ortool
) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.
Hide messages attributes Show messages attributes objectcontent string | array[object]
Hide attributes Show attributes object
* The text content.
* The type of content.- The role of the message author.
- The tool calls generated by the model.
Hide tool_calls attributes Show tool_calls attributes object
* Hide function attributes Show function attributes object
* The arguments to call the function with in JSON format.
* The name of the function to call.
* The type of the tool call.
- The ID of the model to use.
- The upper bound limit for the number of tokens that can be generated for a completion request.
- A sequence of strings to control when the model should stop generating additional tokens.
- The sampling temperature to use.
- A list of tools that the model can call.
Hide tools attributes Show tools attributes object- The type of tool.
- Hide function attributes Show function attributes object
* A description of what the function does. This is used by the model to choose when and how to call the function.
* The name of the function.
* The parameters the functional accepts. This should be formatted as a JSON object.
* Whether to enable schema adherence when generating the function call.
- Nucleus sampling, an alternative to sampling with temperature.