Create an Amazon Bedrock inference endpoint | Elasticsearch API documentation (original) (raw)

Dismiss highlight Show more

Path parameters

The type of the inference task that the model will perform.
Values are completion or text_embedding.
The unique identifier of the inference endpoint.

application/json

Body

Hide chunking_settings attributes Show chunking_settings attributes object
- The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).
- The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
- The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
- The chunking strategy: sentence or word.
Hide service_settings attributes Show service_settings attributes object
- A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.
- The base model ID or an ARN to a custom model based on a foundational model. The base model IDs can be found in the Amazon Bedrock documentation. Note that the model ID must be available for the provider chosen and your IAM user must have access to the model.
  External documentation
- The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:
  * amazontitan - available for text_embedding and completion task types
  * anthropic - available for completion task type only
  * ai21labs - available for completion task type only
  * cohere - available for text_embedding and completion task types
  * meta - available for completion task type only
  * mistral - available for completion task type only
- The region that your model or ARN is deployed in. The list of available regions per model can be found in the Amazon Bedrock documentation.
  External documentation
- Hide rate_limit attribute Show rate_limit attribute object
  * The number of requests allowed per minute.
- A valid AWS secret key that is paired with the access_key. For informationg about creating and managing access and secret keys, refer to the AWS documentation.
  External documentation
Hide task_settings attributes Show task_settings attributes object
- For a completion task, it sets the maximum number for the output tokens to be generated.
- For a completion task, it is a number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random. It should not be used if top_p or top_k is specified.
- For a completion task, it limits samples to the top-K most likely words, balancing coherence and variability. It is only available for anthropic, cohere, and mistral providers. It is an alternative to temperature; it should not be used if temperature is specified.
- For a completion task, it is a number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence. It is an alternative to temperature; it should not be used if temperature is specified.

Responses

200 application/json
Hide response attributes Show response attributes object
- Hide chunking_settings attributes Show chunking_settings attributes object
  * The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).
  * The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
  * The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
  * The chunking strategy: sentence or word.
- The service type
- The inference Id
- Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.