Query a knowledge base and retrieve data (original) (raw)

To query a knowledge base and only return relevant text from data sources, send a Retrieve request with an Agents for Amazon Bedrock runtime endpoint.

The following fields are required:

Field	Basic description
knowledgeBaseId	To specify the knowledge base to query.
retrievalQuery	Contains a text field to specify the query.
guardrailsConfiguration	Include guardrailsConfiguration fields such asguardrailsId and guardrailsVersion to use your guardrail with the request

The following fields are optional:

You can use a reranking model over the default Amazon Bedrock Knowledge Bases ranking model by including the rerankingConfiguration field in the KnowledgeBaseVectorSearchConfiguration. ThererankingConfiguration field maps to a VectorSearchRerankingConfiguration object, in which you can specify the reranking model to use, any additional request fields to include, metadata attributes to filter out documents during reranking, and the number of results to return after reranking. For more information, see VectorSearchRerankingConfiguration.

Note

If you the numberOfRerankedResults value that you specify is greater than the numberOfResults value in the KnowledgeBaseVectorSearchConfiguration, the maximum number of results that will be returned is the value for numberOfResults. An exception is if you use query decomposition (for more information, see the Query modifications section in Configure and customize queries and response generation. If you use query decomposition, the numberOfRerankedResults can be up to five times the numberOfResults.

The response returns the source chunks from the data source as an array ofKnowledgeBaseRetrievalResult objects in the retrievalResults field. EachKnowledgeBaseRetrievalResult contains the following fields:

Field	Description
content	Contains a text source chunk in the text or an image source chunk in thebyteContent field. If the content is an image, the data URI of the base64-encoded content is returned in the following format: data:image/jpeg;base64,${base64-encoded string}.
metadata	Contains each metadata attribute as a key and the metadata value as a JSON value that the key maps to.
location	Contains the URI or URL of the document that the source chunk belongs to.
score	The relevancy score of the document. You can use this score to analyze the ranking of results.

If the number of source chunks exceeds what can fit in the response, a value is returned in the nextToken field. Use that value in another request to return the next batch of results.

If the retrieved data contains images, the response also returns the following response headers, which contain metadata for source chunks returned in the response:

x-amz-bedrock-kb-byte-content-source – Contains the Amazon S3 URI of the image.
x-amz-bedrock-kb-description – Contains the base64-encoded string for the image.

Multimodal queries

For knowledge bases using multimodal embedding models, you can query with either text or images. The retrievalQuery field supports amultimodalInputList field for image queries:

You can query with images by using the multimodalInputList field:

{
    "knowledgeBaseId": "EXAMPLE123", 
    "retrievalQuery": {
        "multimodalInputList": [
            {
                "content": {
                    "byteContent": "base64-encoded-image-data"
                },
                "modality": "IMAGE"
            }
        ]
    }
}

Or you can query with text only by using the text field:

{
    "knowledgeBaseId": "EXAMPLE123",
    "retrievalQuery": {
        "text": "Find similar shoes"
    }
}

Common multimodal query patterns

Following are some common query patterns:

Image-to-image search

Upload an image to find visually similar images. Example: Upload a photo of a red Nike shoe to find similar shoes in your product catalog.

Text-based search

Use text queries to find relevant content. Example: "Find similar shoes" to search your product catalog using text descriptions.

Visual document search

Search for charts, diagrams, or visual elements within documents. Example: Upload a chart image to find similar charts in your document collection.

Choosing between Nova and BDA for multimodal content

When working with multimodal content, choose your approach based on your content type and query patterns:

Nova vs BDA Decision Matrix

Content Type	Use Nova Multimodal Embeddings	Use Bedrock Data Automation (BDA) Parser
Video Content	Visual storytelling focus (sports, ads, demonstrations), queries on visual elements, minimal speech content	Important speech/narration (presentations, meetings, tutorials), queries on spoken content, need transcripts
Audio Content	Music or sound effects identification, non-speech audio analysis	Podcasts, interviews, meetings, any content with speech requiring transcription
Image Content	Visual similarity searches, image-to-image retrieval, visual content analysis	Text extraction from images, document processing, OCR requirements

Note

Nova multimodal embeddings cannot process speech content directly. If your audio or video files contain important spoken information, use the BDA parser to convert speech to text first, or choose a text embedding model instead.

Multimodal query limitations

Following are some limitations with multimodal queries:

Maximum of one image per query in the current release
Image queries are only supported with multimodal embedding models (Titan G1 or Cohere Embed v3)
RetrieveAndGenerate API is not supported for knowledge bases with multimodal embedding models and S3 content buckets
If you provide an image query to a knowledge base using text-only embedding models, a 4xx error will be returned

Multimodal API response structure

Retrieval responses for multimodal content include additional metadata:

Source URI: Points to your original S3 bucket location
Supplemental URI: Points to the copy in your multimodal storage bucket
Timestamp metadata: Included for video and audio chunks to enable precise playback positioning

Note

When using the API or SDK, you'll need to handle file retrieval and timestamp navigation in your application. The console handles this automatically with enhanced video playback and automatic timestamp navigation.