Data availability and durability (original) (raw)

This page discusses concepts related to data availability and durability in Cloud Storage, including how Cloud Storage redundantly stores data, the default replication behavior for dual-regions and multi-regions, the turbo replication feature for dual-regions, and the cross-bucket replication feature.

Key concepts

Redundancy across regions

While traditional storage models often rely on an active-passive approach with "primary" and "secondary" geographic locations, Cloud Storage dual-regions and multi-regions provide an active-active architecture based on a single bucket with redundancy across regions. This simplifies thedisaster recovery process by eliminating the need for users to replicate data from one bucket to another or manually failover to a secondary bucket in the case of primary region downtime.

Cloud Storage always understands the current state of a bucket and transparently serves objects from an available region as required. As a result, dual-region and multi-region buckets are designed to have arecovery time objective (RTO) of zero, and temporary regional failures are normally invisible to users; in the case of a regional outage, dual-region and multi-region buckets automatically continue serving all data that has been replicated across regions.

However, redundancy across regions occurs asynchronously, and any data that does not finish replicating across regions prior to a region becoming unavailable is inaccessible until the downed region comes back online. Data could potentially be lost in the very unlikely case of physical destruction of the region.

Default replication in Cloud Storage is designed to provide redundancy across regions for 99.9% of newly written objects within a target of one hour and 100% of newly written objects within a target of 12 hours. Newly written objects include uploads, rewrites, copies, and compositions.

Cloud Storage also offers a cross-bucket replication capability that can be used to replicate data between independent buckets to meet additional data replication needs that aren't met by dual-region or multi-region locations.

Turbo replication

Turbo replication provides faster redundancy across regions for data in your dual-region buckets, which reduces the risk of data loss exposure andhelps support uninterrupted service following a regional outage. When enabled, turbo replication is designed to replicate 100% of newly written objects to the two regions that constitute a dual-region within therecovery point objective of 15 minutes, regardless of object size.

Note that even for default replication, most objects finish replication within minutes.

While redundancy across regions and turbo replication help supportbusiness continuity and disaster recovery (BCDR) efforts, administrators should plan and implement a full BCDR architecture that's appropriate for their workload.

For more information, see theStep-by-step guide to designing disaster recovery for applications in Google Cloud.

Limitations

Cross-bucket replication

In some cases, you might want to maintain a copy of your data in a second bucket. Cross-bucket replication copies new and updated objects asynchronously from a source bucket to a destination bucket.

Cross-bucket replication differs from default replication and turbo replication in that your data exists in two independent buckets, each with their own configurations such as storage location, encryption, access, and storage class. It is especially suitable for:

Cross-bucket replication uses Storage Transfer Service to replicate objects andPub/Sub to get alerted of changes to the source and destination buckets. You can enable cross-bucket replication on new buckets you create and on existing buckets.

For buckets where the object change rate is under 3,000 per second and objects are under one GiB, cross-bucket replication commonly takes minutes to tens of minutes, but no specific upper bound is supported. Also, buckets experiencing higher change rates or having larger objects can expect to see higher replication delays.

For instructions on using cross-bucket replication, seeUse cross-bucket replication.

Limitations

Performance monitoring

Cloud Storage monitors the oldest unreplicated objects in dual-region and multi-region buckets using default replication or turbo replication. If an object remains unreplicated for longer than its RPO (Recovery Point Objective) time, it's considered to be out of RPO. Each minute in which one or more objects are out of RPO is counted as a "bad" minute.

For example, if one object yielded 20 bad minutes from 9:00-9:20 AM, and another object yielded 10 bad minutes from 9:15-9:25 AM, then there are two objects for the month that are out of RPO. The total number of bad minutes for the month is 25 minutes, because from 9:00 AM to 9:25 AM there was at least one object that was missing its RPO.

Within the Google Cloud console, the Percent of minutes out of RPOgraph lets you monitor the percentage of bad minutes during the past 30 days for your bucket when using default replication or turbo replication within dual-region or multi-region buckets. This service level indicator can be used to monitor your bucket's Monthly Replication Time Conformance. Similarly, thePercent of objects out of target tracks object replications that did not occur within the RPO. This service level indicator can be used to monitor the bucket's Monthly Replication Volume Conformance. For more information, seeCloud Storage monitoring and Cloud Storage SLA.

What's next