Transcoding of gzip-compressed files (original) (raw)

This page discusses the conversion of files to and from a _gzip_-compressed state. The page includes an overview of transcoding, best practices for working with associated metadata, and compressed file behavior in Cloud Storage.

gzip is a form of data compression: it typically reduces the size of a file. This allows the file to be transferred faster and stored using less space than if it were not compressed. Compressing a file can reduce both cost and transfer time. Transcoding, in Cloud Storage, is the automatic changing of a file's compression before it's served to a requester. When transcoding results in a file becoming gzip-compressed, it can be considered compressive, whereas when the result is a file that is no longer gzip-compressed, it can be considered decompressive. Cloud Storage supports the decompressive form of transcoding.

Cloud Storage does not support decompressive transcoding for Brotli-compressed objects.

Decompressive transcoding

Decompressive transcoding allows you to store compressed versions of files in Cloud Storage, which reduces at-rest storage costs, while still serving the file itself to the requester, without any compression. This is useful, for example, when serving files to customers.

In order for decompressive transcoding to occur, an object must meet two criteria:

  1. The file is gzip-compressed when stored in Cloud Storage.
  2. The object's metadata includes Content-Encoding: gzip.

When an object meets these two criteria, it undergoes decompressive transcoding when served, and the response containing the object does not contain aContent-Encoding or Content-Length header.

There are two ways to prevent decompressive transcoding from occurring for an object that is otherwise eligible:

Preventing decompressive transcoding is useful, for example, if you want to reduce outbound data transfer cost or time or if you want to validate the downloaded objects have the expected crc32c/md5 checksums.

Considerations

Keep in mind the following when working with decompressive transcoding:

Content-Type vs. Content-Encoding

There are several behaviors that you should be aware of concerning howContent-Type and Content-Encoding relate to transcoding. Both are metadata stored along with an object. See Viewing and Editing Object Metadata for step-by-step instructions on how to add metadata to objects.

Content-Type should be included in all uploads and indicates the type of object being uploaded. For example:

Content-Type: text/plain

indicates that the uploaded object is a plain-text file. While there is no check to guarantee the specified Content-Type matches the true nature of an uploaded object, incorrectly specifying its type will at best cause requesters to receive something other than what they were expecting and could lead to unintended behaviors.

Content-Encoding is optional and can, if desired, be included in the upload of files that are compressed. For example:

Content-Encoding: gzip

indicates that the uploaded object is gzip-compressed. As withContent-Type, there is no check to guarantee the specified Content-Encodingis actually applied to the uploaded object, and incorrectly specifying an object's encoding could lead to unintended behavior on subsequent download requests.

Good practices

Discouraged practices

Incorrect practices

Using gzip on compressed objects

Some objects, such as many video, audio, and image files, not to mention gzip files themselves, are already compressed. Using gzip on such objects offers virtually no benefit: in almost all cases, doing so makes the object larger due to gzip overhead. For this reason, using gzip on compressed content is generally discouraged and may cause undesired behaviors.

For example, while Cloud Storage allows "doubly compressed" objects (that is, objects that are gzip-compressed but also have an underlying Content-Type that is itself compressed) to be uploaded and stored, it does not allow objects to be served in a doubly compressed state unless their Cache-Control metadata includes no-transform. Instead, it removes the outer, gzip, level of compression, drops the Content-Encoding response header, and serves the resulting object. This occurs even for requests with Accept-Encoding: gzip. The file that is received by the client thus does not have the same checksum as what was uploaded and stored in Cloud Storage, so any integrity checks fail.

Using the Range header

When transcoding occurs, if the request for the object includes a Rangeheader, that header is silently ignored. This means that requests for partial content are not fulfilled, and the response instead serves the entire requested object. For example, if you have a 10 GB object that is eligible for transcoding, but include the header Range: bytes=0-10000 in the request, you still receive the entire 10 GB object.

This behavior arises because it is not possible to select a range from a compressed file without first decompressing the file in its entirety: each request for part of a file would be accompanied by the decompression of the entire, potentially large, file, which would poorly utilize resources. You should be aware of this behavior and avoid using the Range header when using transcoding, as charges are incurred for the transmission of the entire object and not just the range requested. For more information on allowed response behavior to requests with Rangeheaders, see the specification.

If requests with Range headers are needed, you should ensure that transcoding does not occur for the requested object. You can achieve this by choosing the appropriate properties when uploading objects to begin with. For example, range requests for objects with Content-Type: application/gzip and noContent-Encoding are performed as requested.

What's Next