Query a knowledge base and retrieve data (original) (raw)
To query a knowledge base and only return relevant text from data sources, send a Retrieve request with an Agents for Amazon Bedrock runtime endpoint.
The following fields are required:
| Field | Basic description |
|---|---|
| knowledgeBaseId | To specify the knowledge base to query. |
| retrievalQuery | Contains a text field to specify the query. |
| guardrailsConfiguration | Include guardrailsConfiguration fields such asguardrailsId and guardrailsVersion to use your guardrail with the request |
The following fields are optional:
You can use a reranking model over the default Amazon Bedrock Knowledge Bases ranking model by including the rerankingConfiguration field in the KnowledgeBaseVectorSearchConfiguration. ThererankingConfiguration field maps to a VectorSearchRerankingConfiguration object, in which you can specify the reranking model to use, any additional request fields to include, metadata attributes to filter out documents during reranking, and the number of results to return after reranking. For more information, see VectorSearchRerankingConfiguration.
Note
If you the numberOfRerankedResults value that you specify is greater than the numberOfResults value in the KnowledgeBaseVectorSearchConfiguration, the maximum number of results that will be returned is the value for numberOfResults. An exception is if you use query decomposition (for more information, see the Query modifications section in Configure and customize queries and response generation. If you use query decomposition, the numberOfRerankedResults can be up to five times the numberOfResults.
The response returns the source chunks from the data source as an array ofKnowledgeBaseRetrievalResult objects in the retrievalResults field. EachKnowledgeBaseRetrievalResult contains the following fields:
| Field | Description |
|---|---|
| content | Contains a text source chunk in the text or an image source chunk in thebyteContent field. If the content is an image, the data URI of the base64-encoded content is returned in the following format: data:image/jpeg;base64,${base64-encoded string}. |
| metadata | Contains each metadata attribute as a key and the metadata value as a JSON value that the key maps to. |
| location | Contains the URI or URL of the document that the source chunk belongs to. |
| score | The relevancy score of the document. You can use this score to analyze the ranking of results. |
If the number of source chunks exceeds what can fit in the response, a value is returned in the nextToken field. Use that value in another request to return the next batch of results.
If the retrieved data contains images, the response also returns the following response headers, which contain metadata for source chunks returned in the response:
x-amz-bedrock-kb-byte-content-source– Contains the Amazon S3 URI of the image.x-amz-bedrock-kb-description– Contains the base64-encoded string for the image.
Multimodal queries
For knowledge bases using multimodal embedding models, you can query with either text or images. The retrievalQuery field supports amultimodalInputList field for image queries:
You can query with images by using the multimodalInputList field:
{
"knowledgeBaseId": "EXAMPLE123",
"retrievalQuery": {
"multimodalInputList": [
{
"content": {
"byteContent": "base64-encoded-image-data"
},
"modality": "IMAGE"
}
]
}
}Or you can query with text only by using the text field:
{
"knowledgeBaseId": "EXAMPLE123",
"retrievalQuery": {
"text": "Find similar shoes"
}
}Common multimodal query patterns
Following are some common query patterns:
Image-to-image search
Upload an image to find visually similar images. Example: Upload a photo of a red Nike shoe to find similar shoes in your product catalog.
Text-based search
Use text queries to find relevant content. Example: "Find similar shoes" to search your product catalog using text descriptions.
Visual document search
Search for charts, diagrams, or visual elements within documents. Example: Upload a chart image to find similar charts in your document collection.
Choosing between Nova and BDA for multimodal content
When working with multimodal content, choose your approach based on your content type and query patterns:
Nova vs BDA Decision Matrix
| Content Type | Use Nova Multimodal Embeddings | Use Bedrock Data Automation (BDA) Parser |
|---|---|---|
| Video Content | Visual storytelling focus (sports, ads, demonstrations), queries on visual elements, minimal speech content | Important speech/narration (presentations, meetings, tutorials), queries on spoken content, need transcripts |
| Audio Content | Music or sound effects identification, non-speech audio analysis | Podcasts, interviews, meetings, any content with speech requiring transcription |
| Image Content | Visual similarity searches, image-to-image retrieval, visual content analysis | Text extraction from images, document processing, OCR requirements |
Note
Nova multimodal embeddings cannot process speech content directly. If your audio or video files contain important spoken information, use the BDA parser to convert speech to text first, or choose a text embedding model instead.
Multimodal query limitations
Following are some limitations with multimodal queries:
- Maximum of one image per query in the current release
- Image queries are only supported with multimodal embedding models (Titan G1 or Cohere Embed v3)
- RetrieveAndGenerate API is not supported for knowledge bases with multimodal embedding models and S3 content buckets
- If you provide an image query to a knowledge base using text-only embedding models, a 4xx error will be returned
Multimodal API response structure
Retrieval responses for multimodal content include additional metadata:
- Source URI: Points to your original S3 bucket location
- Supplemental URI: Points to the copy in your multimodal storage bucket
- Timestamp metadata: Included for video and audio chunks to enable precise playback positioning
Note
When using the API or SDK, you'll need to handle file retrieval and timestamp navigation in your application. The console handles this automatically with enhanced video playback and automatic timestamp navigation.