Create an Amazon Bedrock inference endpoint | Elasticsearch API documentation (original) (raw)
Dismiss highlight Show more
Path parameters
- The type of the inference task that the model will perform.
Values arecompletion
ortext_embedding
. - The unique identifier of the inference endpoint.
application/json
Body
- Hide chunking_settings attributes Show chunking_settings attributes object
- The maximum size of a chunk in words. This value cannot be higher than
300
or lower than20
(forsentence
strategy) or10
(forword
strategy). - The number of overlapping words for chunks. It is applicable only to a
word
chunking strategy. This value cannot be higher than half themax_chunk_size
value. - The number of overlapping sentences for chunks. It is applicable only for a
sentence
chunking strategy. It can be either1
or0
. - The chunking strategy:
sentence
orword
.
- The maximum size of a chunk in words. This value cannot be higher than
- Hide service_settings attributes Show service_settings attributes object
- A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.
- The base model ID or an ARN to a custom model based on a foundational model. The base model IDs can be found in the Amazon Bedrock documentation. Note that the model ID must be available for the provider chosen and your IAM user must have access to the model.
External documentation - The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:
*amazontitan
- available fortext_embedding
andcompletion
task types
*anthropic
- available forcompletion
task type only
*ai21labs
- available forcompletion
task type only
*cohere
- available fortext_embedding
andcompletion
task types
*meta
- available forcompletion
task type only
*mistral
- available forcompletion
task type only - The region that your model or ARN is deployed in. The list of available regions per model can be found in the Amazon Bedrock documentation.
External documentation - Hide rate_limit attribute Show rate_limit attribute object
* The number of requests allowed per minute. - A valid AWS secret key that is paired with the
access_key
. For informationg about creating and managing access and secret keys, refer to the AWS documentation.
External documentation
- Hide task_settings attributes Show task_settings attributes object
- For a
completion
task, it sets the maximum number for the output tokens to be generated. - For a
completion
task, it is a number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random. It should not be used iftop_p
ortop_k
is specified. - For a
completion
task, it limits samples to the top-K most likely words, balancing coherence and variability. It is only available for anthropic, cohere, and mistral providers. It is an alternative totemperature
; it should not be used iftemperature
is specified. - For a
completion
task, it is a number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence. It is an alternative totemperature
; it should not be used iftemperature
is specified.
- For a
Responses
- 200 application/json
Hide response attributes Show response attributes object- Hide chunking_settings attributes Show chunking_settings attributes object
* The maximum size of a chunk in words. This value cannot be higher than300
or lower than20
(forsentence
strategy) or10
(forword
strategy).
* The number of overlapping words for chunks. It is applicable only to aword
chunking strategy. This value cannot be higher than half themax_chunk_size
value.
* The number of overlapping sentences for chunks. It is applicable only for asentence
chunking strategy. It can be either1
or0
.
* The chunking strategy:sentence
orword
. - The service type
- The inference Id
- Values are
sparse_embedding
,text_embedding
,rerank
,completion
, orchat_completion
.
- Hide chunking_settings attributes Show chunking_settings attributes object