New embedding models and API updates (original) (raw)

text-embedding-3-small is our new highly efficient embedding model and provides a significant upgrade over its predecessor, the text-embedding-ada-002 model released in December 2022⁠.

Stronger performance. Comparing text-embedding-ada-002 to text-embedding-3-small, the average score on a commonly used benchmark for multi-language retrieval (MIRACL⁠(opens in a new window)) has increased from 31.4% to 44.0%, while the average score on a commonly used benchmark for English tasks (MTEB⁠(opens in a new window)) has increased from 61.0% to 62.3%.

Reduced price. text-embedding-3-small is also substantially more efficient than our previous generation text-embedding-ada-002 model. Pricing for text-embedding-3-small has therefore been reduced by 5X compared to text-embedding-ada-002, from a price per 1k tokens of 0.0001to0.0001 to 0.0001to0.00002.

We are not deprecating text-embedding-ada-002, so while we recommend the newer model, customers are welcome to continue using the previous generation model.

A new large text embedding model: text-embedding-3-large

text-embedding-3-large is our new next generation larger embedding model and creates embeddings with up to 3072 dimensions.

Stronger performance. text-embedding-3-large is our new best performing model. Comparing text-embedding-ada-002 to text-embedding-3-large: on MIRACL, the average score has increased from 31.4% to 54.9%, while on MTEB, the average score has increased from 61.0% to 64.6%.