Batch processing - Anthropic (original) (raw)

Batch processing is a powerful approach for handling large volumes of requests efficiently. Instead of processing requests one at a time with immediate responses, batch processing allows you to submit multiple requests together for asynchronous processing. This pattern is particularly useful when:

The Message Batches API is our first implementation of this pattern.


Message Batches API

The Message Batches API is a powerful, cost-effective way to asynchronously process large volumes of Messages requests. This approach is well-suited to tasks that do not require immediate responses, with most batches finishing in less than 1 hour while reducing costs by 50% and increasing throughput.

You can explore the API reference directly, in addition to this guide.

How the Message Batches API works

When you send a request to the Message Batches API:

  1. The system creates a new Message Batch with the provided Messages requests.
  2. The batch is then processed asynchronously, with each request handled independently.
  3. You can poll for the status of the batch and retrieve results when processing has ended for all requests.

This is especially useful for bulk operations that don’t require immediate results, such as:

Batch limitations

Supported models

The Message Batches API currently supports:

What can be batched

Any request that you can make to the Messages API can be included in a batch. This includes:

Since each request in the batch is processed independently, you can mix different types of requests within a single batch.


Pricing

The Batches API offers significant cost savings. All usage is charged at 50% of the standard API prices.

Model Batch input Batch output
Claude Opus 4 7.50/MTok∣7.50 / MTok 7.50/MTok37.50 / MTok
Claude Sonnet 4 1.50/MTok∣1.50 / MTok 1.50/MTok7.50 / MTok
Claude Sonnet 3.7 1.50/MTok∣1.50 / MTok 1.50/MTok7.50 / MTok
Claude Sonnet 3.5 1.50/MTok∣1.50 / MTok 1.50/MTok7.50 / MTok
Claude Haiku 3.5 0.40/MTok∣0.40 / MTok 0.40/MTok2 / MTok
Claude Opus 3 7.50/MTok∣7.50 / MTok 7.50/MTok37.50 / MTok
Claude Haiku 3 0.125/MTok∣0.125 / MTok 0.125/MTok0.625 / MTok

How to use the Message Batches API

Prepare and create your batch

A Message Batch is composed of a list of requests to create a Message. The shape of an individual request is comprised of:

You can create a batch by passing this list into the requests parameter:

In this example, two separate requests are batched together for asynchronous processing. Each request has a unique custom_id and contains the standard parameters you’d use for a Messages API call.

When a batch is first created, the response will have a processing status of in_progress.

Tracking your batch

The Message Batch’s processing_status field indicates the stage of processing the batch is in. It starts as in_progress, then updates to ended once all the requests in the batch have finished processing, and results are ready. You can monitor the state of your batch by visiting the Console, or using the retrieval endpoint:

You can poll this endpoint to know when processing has ended.

Retrieving batch results

Once batch processing has ended, each Messages request in the batch will have a result. There are 4 result types:

Result Type Description
succeeded Request was successful. Includes the message result.
errored Request encountered an error and a message was not created. Possible errors include invalid requests and internal server errors. You will not be billed for these requests.
canceled User canceled the batch before this request could be sent to the model. You will not be billed for these requests.
expired Batch reached its 24 hour expiration before this request could be sent to the model. You will not be billed for these requests.

You will see an overview of your results with the batch’s request_counts, which shows how many requests reached each of these four states.

Results of the batch are available for download at the results_url property on the Message Batch, and if the organization permission allows, in the Console. Because of the potentially large size of the results, it’s recommended to stream results back rather than download them all at once.

The results will be in .jsonl format, where each line is a valid JSON object representing the result of a single request in the Message Batch. For each streamed result, you can do something different depending on its custom_id and result type. Here is an example set of results:

If your result has an error, its result.error will be set to our standard error shape.

Using prompt caching with Message Batches

The Message Batches API supports prompt caching, allowing you to potentially reduce costs and processing time for batch requests. The pricing discounts from prompt caching and Message Batches can stack, providing even greater cost savings when both features are used together. However, since batch requests are processed asynchronously and concurrently, cache hits are provided on a best-effort basis. Users typically experience cache hit rates ranging from 30% to 98%, depending on their traffic patterns.

To maximize the likelihood of cache hits in your batch requests:

  1. Include identical cache_control blocks in every Message request within your batch
  2. Maintain a steady stream of requests to prevent cache entries from expiring after their 5-minute lifetime
  3. Structure your requests to share as much cached content as possible

Example of implementing prompt caching in a batch:

In this example, both requests in the batch include identical system messages and the full text of Pride and Prejudice marked with cache_control to increase the likelihood of cache hits.

Best practices for effective batching

To get the most out of the Batches API:

Troubleshooting common issues

If experiencing unexpected behavior:

Note that the failure of one request in a batch does not affect the processing of other requests.


Batch storage and privacy


FAQ