Azure API Management policy reference - azure-openai-semantic-cache-lookup (original) (raw)

APPLIES TO: All API Management tiers

Use the azure-openai-semantic-cache-lookup policy to perform cache lookup of responses to Azure OpenAI Chat Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

This policy must have a corresponding Cache responses to Azure OpenAI API requests policy.
For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.

Supported Azure OpenAI Service models

The policy is used with APIs added to API Management from the Azure OpenAI Service of the following types:

API type	Supported models
Chat completion	gpt-3.5gpt-4gpt-4ogpt-4o-minio1o3
Embeddings	text-embedding-3-large text-embedding-3-smalltext-embedding-ada-002
Responses (preview)	gpt-4o (Versions: 2024-11-20, 2024-08-06, 2024-05-13)gpt-4o-mini (Version: 2024-07-18)gpt-4.1 (Version: 2025-04-14)gpt-4.1-nano (Version: 2025-04-14)gpt-4.1-mini (Version: 2025-04-14)gpt-image-1 (Version: 2025-04-15)o3 (Version: 2025-04-16)o4-mini (Version: `2025-04-16)

Note

Traditional completion APIs are only available with legacy model versions and support is limited.

For current information about the models and their capabilities, see Azure OpenAI Service models.

Policy statement

<azure-openai-semantic-cache-lookup
    score-threshold="similarity score threshold"
    embeddings-backend-id ="backend entity ID for embeddings API"
    embeddings-backend-auth ="system-assigned"             
    ignore-system-messages="true | false"      
    max-message-count="count" >
    <vary-by>"expression to partition caching"</vary-by>
</azure-openai-semantic-cache-lookup>

Attributes

Attribute	Description	Required	Default
score-threshold	Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. Smaller values represent greater semantic similarity. Learn more.	Yes	N/A
embeddings-backend-id	Backend ID for OpenAI embeddings API call.	Yes	N/A
embeddings-backend-auth	Authentication used for Azure OpenAI embeddings API backend.	Yes. Must be set to system-assigned.	N/A
ignore-system-messages	Boolean. When set to true (recommended), removes system messages from a GPT chat completion prompt before assessing cache similarity.	No	false
max-message-count	If specified, number of remaining dialog messages after which caching is skipped.	No	N/A

Elements

Name	Description	Required
vary-by	A custom expression determined at runtime whose value partitions caching. If multiple vary-by elements are added, values are concatenated to create a unique combination.	No

Usage

Policy sections: inbound
Policy scopes: global, product, API, operation
Gateways: classic, v2, consumption

Usage notes

This policy can only be used once in a policy section.
Fine-tune the value of score-threshold based on your application to ensure that the right sensitivity is used when determining which queries to cache. Start with a low value such as 0.05 and adjust to optimize the ratio of cache hits to misses.
The embeddings model should have enough capacity and sufficient context size to accommodate the prompt volume and prompts.

Examples

Example with corresponding azure-openai-semantic-cache-store policy

<policies>
    <inbound>
        <base />
        <azure-openai-semantic-cache-lookup
            score-threshold="0.05"
            embeddings-backend-id ="azure-openai-backend"
            embeddings-backend-auth ="system-assigned" >
            <vary-by>@(context.Subscription.Id)</vary-by>
        </azure-openai-semantic-cache-lookup>
    </inbound>
    <outbound>
        <azure-openai-semantic-cache-store duration="60" />
        <base />
    </outbound>
</policies>

For more information about working with policies, see:

Tutorial: Transform and protect your API
Policy reference for a full list of policy statements and their settings
Policy expressions
Set or edit policies
Reuse policy configurations
Policy snippets repo
Policy playground repo
Azure API Management policy toolkit
Get Copilot assistance to create, explain, and troubleshoot policies