Improve search and RAG quality with ranking API (original) (raw)

Skip to main content

Improve search and RAG quality with ranking API

As part of your Retrieval Augmented Generation (RAG) experience in Agent Search, you can rank a set of documents based on a query.

The ranking API takes a list of documents and reranks those documents based on how relevant the documents are to a query. Compared to embeddings, which look only at the semantic similarity of a document and a query, the ranking API can give you precise scores for how well a document answers a given query. The ranking API can be used to improve the quality of search results after retrieving an initial set of candidate documents.

The ranking API is stateless so there's no need to index documents before calling the API. All you need to do is pass in the query and documents. This makes the API well suited for reranking documents from Vector Search and other search solutions.

This page describes how to use the ranking API to rank a set of documents based on a query.

Use cases

The primary use case of the ranking API is to improve the quality of search results.

However, the ranking API can be valuable for any scenario where you need to find what pieces of content are most relevant to a user's query. For example, the ranking API can assist you in the following:

The following flow outlines how you might use the ranking API to improve the quality of results for chunked documents:

  1. Use Document AI Layout Parser API to split a set of documents into chunks.
  2. Use an embeddings API to create embeddings for each of the chunks.
  3. Load the embeddings into Vector Search or another search solution.
  4. Query your search index and retrieve the most relevant chunks.
  5. Rerank the relevant chunks using the ranking API.

Input data

The ranking API requires the following inputs:

"query": "Why is the sky blue?"  
"records": [  
   {  
       "id": "1",  
       "title": "The Color of the Sky: A Poem",  
       "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."  
   },  
   {  
       "id": "2",  
       "title": "The Science of a Blue Sky",  
       "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."  
   }  
]  
"topN": 10,  
"ignoreRecordDetailsInResponse": true,  
"model": "semantic-ranker-default@latest"  

Output data

The ranking API returns a ranked list of records with following outputs:

{
    "records": [
        {
            "id": "2",
            "score": 0.98,
            "title": "The Science of a Blue Sky",
            "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
        },
        {
            "id": "1",
            "score": 0.64,
            "title": "The Color of the Sky: A Poem",
            "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
        }
    ]
}

Rank (or rerank) a set of records according to a query

Typically, you'll supply the ranking API with a query and a set of records that are relevant to that query and have already been ranked by some other method such as a keyword search or a vector search. Then, you use the ranking API to improve the quality of the ranking and determine a score that indicates the relevance of each record to the query.

  1. Obtain the query and resulting records. Ensure that each record has an ID and either a title, content, or both.
    The maximum number of supported tokens per record depends on the model version. Models up to version 003, such as semantic-ranker-512-003, support 512 tokens per record. Starting from version 004, this limit increases to 1024 tokens. If the combined length of the title and content exceeds the model's token limit, the extra content is truncated.
  2. Call the rankingConfigs.rank method using the following code:

REST

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/rankingConfigs/default_ranking_config:rank" \
-d '{
"model": "semantic-ranker-default@latest",
"query": "QUERY",
"records": [
    {
        "id": "RECORD_ID_1",
        "title": "TITLE_1",
        "content": "CONTENT_1"
    },
    {
        "id": "RECORD_ID_2",
        "title": "TITLE_2",
        "content": "CONTENT_2"
    },
    {
        "id": "RECORD_ID_3",
        "title": "TITLE_3",
        "content": "CONTENT_3"
    }
]
}'

Replace the following:

For general information about this method, see rankingConfigs.rank.

Click for an example curl command and response.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: my-project-123" \
"https://discoveryengine.googleapis.com/v1/projects/my-project-123/locations/global/rankingConfigs/default_ranking_config:rank" \
-d '{
    "model": "semantic-ranker-default@latest",
    "query": "what is Google gemini?",
    "records": [
        {
            "id": "1",
            "title": "Gemini",
            "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side."
        },
        {
            "id": "2",
            "title": "Gemini",
            "content": "Gemini is a cutting edge large language model created by Google."
        },
        {
            "id": "3",
            "title": "Gemini Constellation",
            "content": "Gemini is a constellation that can be seen in the night sky."
        }
    ]
}'

{ "records": [ { "id": "2", "title": "Gemini", "content": "Gemini is a cutting edge large language model created by Google.", "score": 0.97 }, { "id": "3", "title": "Gemini Constellation", "content": "Gemini is a constellation that can be seen in the night sky.", "score": 0.18 }, { "id": "1", "title": "Gemini", "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side.", "score": 0.05 } ] }

Python

For more information, see theAgent Search Python API reference documentation.

To authenticate to Agent Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

Supported models

The following models are available.

Model name Latest default model (semantic-ranker-default@latest) Latest fast model (semantic-ranker-fast@latest) Input Context window Release date Discontinuation date
semantic-ranker-default-004 Yes No Text (25 languages) 1024 April 9, 2025 To be determined
semantic-ranker-fast-004 No Yes Text (25 languages) 1024 April 9, 2025 To be determined
semantic-ranker-default-003 No No Text (25 languages) 512 September 10, 2024 To be determined
semantic-ranker-default-002 No No Text (en only) 512 June 3, 2024 To be determined

What's next

Learn how to use the ranking method with other RAG APIs togenerate grounded answers from unstructured data.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-06-15 UTC.