Parallel composite uploads (original) (raw)

One strategy for uploading large files is called parallel composite uploads. In such an upload, a file is divided into up to 32 chunks, the chunks are uploaded in parallel to temporary objects, the final object isrecreated using the temporary objects, and the temporary objects are deleted.

Parallel composite uploads can be significantly faster if network and disk speed are not limiting factors; however, the final object stored in your bucket is a composite object, which only has a crc32c hash and not an MD5 hash. As a result, you must use crcmod to perform integrity checks when downloading the object with Python applications. You should only perform parallel composite uploads if the following apply:

How tools and APIs use parallel composite uploads

Depending on how you interact with Cloud Storage, parallel composite uploads might be managed automatically on your behalf. This section describes parallel composite upload behavior for different tools and provides information for how you can modify the behavior.

Console

The Google Cloud console does not perform parallel composite uploads.

Command line

You can configure how and when gcloud storage cp performs parallel composite uploads by modifying the following properties:

You can modify these properties by creating a named configuration and applying the configuration either on a per-command basis by using the--configuration project-wide flag or for all gcloud CLI commands by using the gcloud config set command.

No additional local disk space is required when using gcloud CLI to perform parallel composite uploads. If a parallel composite upload fails prior to composition, run the gcloud CLI command again to take advantage of resumable uploads for the temporary objects that failed. Any temporary objects that uploaded successfully before the failure do not get re-uploaded when you resume the upload.

Temporary objects are named in the following fashion:

TEMPORARY_PREFIX/RANDOM_VALUE_HEX_DIGEST_COMPONENT_ID

Where:

Generally, temporary objects are deleted at the end of a parallel composite upload, but to avoid leaving temporary objects around, you should check the exit status from the gcloud CLI command, and you should manually delete any temporary objects that were uploaded as part of any aborted upload.

Client libraries

Java

For more information, see theCloud Storage Java API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

You can perform parallel composite uploads by setting AllowParallelCompositeUpload to true. For example:

Node.js

For more information, see theCloud Storage Node.js API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The Node.js client library does not support parallel composite uploads. Instead, useXML API multipart uploads.

Python

For more information, see theCloud Storage Python API reference documentation.

To authenticate to Cloud Storage, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The Python client library does not support parallel composite uploads. Instead, useXML API multipart uploads.

REST APIs

JSON API

The JSON API supports uploading object chunks in parallel and recombining them into a single object using thecompose operation.

Keep the following in mind when designing code for parallel composite uploads:

XML API

The XML API supports uploading object chunks in parallel and recombining them into a single object using thecompose operation.

Keep the following in mind when designing code for parallel composite uploads: