Text Embeddings (original) (raw)

Voyage currently provides the following text embedding models:

Model Context Length (tokens) Embedding Dimension Description
voyage-4-large 32,000 1024 (default), 256, 512, 2048 The best general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other. See blog post for details.
voyage-4 32,000 1024 (default), 256, 512, 2048 Optimized for general-purpose and multilingual retrieval quality. All embeddings created with the 4 series are compatible with each other. See blog post for details.
voyage-4-lite 32,000 1024 (default), 256, 512, 2048 Optimized for latency and cost. All embeddings created with the 4 series are compatible with each other. See blog post for details.
voyage-code-3 32,000 1024 (default), 256, 512, 2048 Optimized for code retrieval. See blog post for details.
voyage-finance-2 32,000 1024 Optimized for finance retrieval and RAG. See blog post for details.
voyage-law-2 16,000 1024 Optimized for legal retrieval and RAG. Also improved performance across all domains. See blog post for details.
voyage-code-2 16,000 1536 Optimized for code retrieval (17% better than alternatives) / Previous generation of code embeddings. See blog post for details.

Need help deciding which text embedding model to use? Check out our FAQ.

Older models

The following are our earlier models, which are still accessible from our API. We recommend using the new models above for better quality and efficiency. Our latest models listed in the above table will be strictly better than the legacy models in all aspects, such as quality, context length, latency, and throughput.

Model Context Length (tokens) Embedding Dimension Description
voyage-3-large 32,000 1024 (default), 256, 512, 2048 Previous generation of text embeddings for general-purpose and multilingual retrieval quality. See blog post for details.
voyage-3.5 32,000 1024 (default), 256, 512, 2048 Previous generation of text embeddings optimized for general-purpose and multilingual retrieval quality. See blog post for details.
voyage-3.5-lite 32,000 1024 (default), 256, 512, 2048 Previous generation of text embeddings optimized for latency and cost. See blog post for details.
voyage-3 32,000 1024 Optimized for general-purpose and multilingual retrieval quality. See blog post for details.
voyage-3-lite 32,000 512 Optimized for latency and cost. See blog post for details.
voyage-multilingual-2 32,000 1024 Optimized for multilingual retrieval and RAG. See blog post for details.
voyage-large-2-instruct 16,000 1024 Top of MTEB leaderboard. Instruction-tuned general-purpose embedding model optimized for clustering, classification, and retrieval. For retrieval, please use input_type parameter to specify whether the text is a query or document. For classification and clustering, please use the instructions here. See blog post for details. We recommend existing voyage-large-2-instruct users to transition to voyage-3.
voyage-large-2 16,000 1536 General-purpose embedding model that is optimized for retrieval quality (e.g., better than OpenAI V3 Large). Please transition to voyage-3.
voyage-2 4000 1024 General-purpose embedding model optimized for a balance between cost, latency, and retrieval quality. Please transition to voyage-3-lite.
voyage-lite-02-instruct 4000 1024 [_Deprecated_] Instruction-tuned for classification, clustering, and sentence textual similarity tasks, which are the only recommended use cases. Please transition to voyage-3.
voyage-02 4000 1024 [_Deprecated_] This is our pilot-version v2 embedding model. We kindly ask you to transition to voyage-3 as detailed above.
voyage-01 4000 1024 [_Deprecated_] This is our v1 embedding model. Please transition to voyage-3.
voyage-lite-01 4000 1024 [_Deprecated_] This is our v1 embedding model. Please transition to voyage-3.
voyage-lite-01-instruct 4000 1024 [_Deprecated_] Tweaked on top of voyage-lite-01 for classification and clustering tasks. Please transition to voyage-3.

Voyage also provides the following open-weight embedding models:

Model Context Length (tokens) Embedding Dimension Description
voyage-4-nano 32,000 1024 (default), 256, 512, 2048 Open-weight model available on Hugging Face. All embeddings created with the 4 series are compatible with eachother. See blog post for details.

Voyage text embeddings are accessible in Python through the voyageai package. Please install the voyageai package, set up the API key, and use the voyageai.Client.embed() function to vectorize your inputs.

voyageai.Client.embed (texts : List[str], model : str, input_type : Optional[str] = None, truncation : Optional[bool] = None, output_dimension: Optional[int] = None, output_dtype: Optional[str] = "float")

Parameters

Returns

Example

import voyageai

vo = voyageai.Client()
# This will automatically use the environment variable VOYAGE_API_KEY.
# Alternatively, you can use vo = voyageai.Client(api_key="<your secret key>")

texts = [
    "The Mediterranean diet emphasizes fish, olive oil, and vegetables, believed to reduce chronic diseases.",
    "Photosynthesis in plants converts light energy into glucose and produces essential oxygen.",
    "20th-century innovations, from radios to smartphones, centered on electronic advancements.",
    "Rivers provide water, irrigation, and habitat for aquatic species, vital for ecosystems.",
    "Apple’s conference call to discuss fourth fiscal quarter results and business updates is scheduled for Thursday, November 2, 2023 at 2:00 p.m. PT / 5:00 p.m. ET.",
    "Shakespeare's works, like 'Hamlet' and 'A Midsummer Night's Dream,' endure in literature."
]

# Embed the documents
result = vo.embed(texts, model="voyage-4-large", input_type="document")
print(result.embeddings)
[
    [-0.01225314, 0.00206815, 0.030598236, ...],
    [0.008976975,-0.002981404,0.015463986, ...],
    [-0.062971942,-0.047771815,-0.101127356, ...],
    [0.047902156,-0.004106794,-0.007774695, ...],
    [-0.008545358,-0.063952357,-0.007111943, ...],
    [-0.00172591,-0.004946421,-0.04327229, ...]
]

The following functions are deprecated and will be removed in the future.

get_embedding(text, model="voyage-01", input_type=None)

Parameters

Returns

get_embeddings(list_of_text, model="voyage-01", input_type=None)

Parameters

Returns


Voyage text embeddings can be accessed by calling the endpoint POST https://api.voyageai.com/v1/embeddings. Please refer to the Text Embeddings API Reference for the specification.

Example

curl https://api.voyageai.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -d '{
    "input": "Sample text",
    "model": "voyage-4-large",
    "input_type": "document"
  }'
curl https://api.voyageai.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $VOYAGE_API_KEY" \
  -d '{
    "input": ["Sample text 1", "Sample text 2"],
    "model": "voyage-4-large",
    "input_type": "document"
  }'

Voyage text embeddings are accessible in TypeScript through the Voyage TypeScript Library, which exposes all the functionality of our text embeddings endpoint (see Text Embeddings API Reference).