Cloud Storage FUSE (original) (raw)

This document provides an overview of Cloud Storage FUSE, a FUSEadapter that lets you mount and access Cloud Storage buckets as local file systems, so applications can read and write objects in your bucket using standard file system semantics.

This documentation always reflects the latest version of Cloud Storage FUSE. For details on the latest version, seeCloud Storage FUSE releases on GitHub.

Cloud Storage FUSE is an open source product supported by Google. Cloud Storage FUSE uses FUSE and Cloud Storage APIs to transparently expose buckets as locally mounted folders on your file system.

To use Cloud Storage FUSE, you install the Cloud Storage FUSE package on a machine with a compatible operating system such as Linux or Windows Subsystem for Linux, ensure proper Google Cloud authentication and permissions, and then execute the gcsfuse command to mount a specific Cloud Storage bucket to a local directory.

Cloud Storage FUSE is integrated with other Google Cloud services. For example, the Cloud Storage FUSE CSI driver lets you use the Google Kubernetes Engine (GKE) API toconsume buckets as volumes, so you can read from and write to Cloud Storage from within your Kubernetes pods. For more information on other integrations, see Integrations.

How Cloud Storage FUSE works

Cloud Storage FUSE works by translating object storage names into a directory-like structure, interpreting the slash character (/) in object names as a directory separator. Objects with the same common prefix are treated as files in the same directory, allowing applications to interact with the mounted bucket like a file system. Objects can also be organized into a logical file system structure using hierarchical namespace, which lets you organize objects into folders.

Cloud Storage FUSE can be run from anywhere with connectivity to Cloud Storage, including Google Kubernetes Engine, Compute Engine VMs, or on-premises systems.

Cloud Storage FUSE for machine learning

Cloud Storage FUSE is ideal for use cases where Cloud Storage has the right performance and scalability characteristics for an application that requires file system semantics. For example, Cloud Storage FUSE isuseful for machine learning (ML) projects because it provides a way to store data, models, checkpoints, and logs directly in Cloud Storage. For more information, see Cloud Storage FUSE for ML workloads.

Cloud Storage FUSE is a common choice for developers looking to store and access ML training and model data as objects in Cloud Storage. Cloud Storage FUSE provides several benefits for developing ML projects:

For more information, seeFrameworks, operating systems, and architectures supported by Cloud Storage FUSE.

Frameworks, operating systems, and architectures

Cloud Storage FUSE has been validated with the following frameworks:

Cloud Storage FUSE supports the following operating systems and architectures:

Integrations with other Google Cloud products

Cloud Storage FUSE integrates with the following Google Cloud products:

Product How Cloud Storage FUSE is integrated
AI Hypercomputer Cloud Storage FUSE is recommended for AI and ML use cases because it lets you mount buckets as local file systems and scale your data storage with more cost efficiency than file system services. For more information, seeStorage services.
Batch Cloud Storage FUSE lets you mount Cloud Storage buckets as storage volumes when you create and run Batch jobs. You canspecify a bucket in a job's definition, and the bucket gets automatically mounted to the VMs for the job when the job runs.
Cloud Composer When you create an environment, Cloud Composerstores the source code for your workflows and their dependencies in specific folders in a Cloud Storage bucket. Cloud Composer uses Cloud Storage FUSE to map the folders in the bucket to the Airflow components in the Cloud Composer environment.
Cloud Run Cloud Run lets you mount a Cloud Storage bucket as a volume and presents the bucket content as files in the container file system. To set up volume mounting, seeMount a Cloud Storage volume.
Cluster Toolkit Cluster Toolkit lets you create or mount a Cloud Storage bucket as a file system. You can specify the bucket in a blueprint YAML file using the appropriate module. The bucket is then automatically created or mounted when the deployment runs.
Dataflow When you use Cloud Storage FUSE to mount Cloud Storage buckets directly onto the worker file system, theApache Beam pipeline code underlying Dataflow can access files in Cloud Storage directly using standard file system semantics. This is particularly helpful when using Dataflow for AI/ML tasks that involve large datasets and software that requires file access.
Deep Learning Containers To mount Cloud Storage buckets for Deep Learning Containers, you can either use theCloud Storage FUSE CSI driver (recommended) or install Cloud Storage FUSE.
Deep Learning VM Images Cloud Storage FUSE comes pre-installed with Deep Learning VM Images.
Google Kubernetes Engine (GKE) The Cloud Storage FUSE CSI driver manages the integration of Cloud Storage FUSE with the Kubernetes API to consume Cloud Storage buckets as volumes. You can use the Cloud Storage FUSE CSI driver to mount buckets as file systems on Google Kubernetes Engine nodes.
Vertex AI training You can access data from a Cloud Storage bucket as amounted file system when you perform custom training on Vertex AI. For more information, seeRead and write Cloud Storage files with Cloud Storage FUSE.
Vertex AI Workbench Vertex AI Workbench instances include a Cloud Storage integration that lets you browse buckets and work with compatible files located in Cloud Storage from within the JupyterLab interface. The Cloud Storage integration lets you access all of the Cloud Storage buckets and files that your instance has access to within the same project as your Vertex AI Workbench instance. To set up the integration, seeVertex AI Workbench instructions for how to access Cloud Storage buckets and files in JupyterLab.

For a list of Google Cloud products that are integrated with Cloud Storage generally, see Integration with Google Cloud services and tools.

Caching

Cloud Storage FUSE offers four types of caching to help increase performance and reduce cost: file caching, stat caching, type caching, and list caching. For more information about these caches, see Overview of caching.

Directory semantics

Cloud Storage offers buckets with a flat namespace and buckets with hierarchical namespace enabled. By default, Cloud Storage FUSE can infer explicitly-defined directories, also known as folders, in buckets with hierarchical namespace enabled but it can't infer implicitly-defined directories in buckets with a flat namespace, includingsimulated folders and managed folders.

Explicitly defined directories are folders that are represented by their own objects in Cloud Storage buckets.Implicitly defined directories are directories that don't have their own corresponding objects in Cloud Storage buckets.

For example, say you mount a bucket named my-bucket, which contains an object named my-directory/my-object.txt, where my-directory/ is a simulated folder. When you run ls on the bucket mount point, by default, Cloud Storage FUSE cannot access the simulated directory my-bucket/my-directory/ nor the objectmy-object.txt within it. To enable Cloud Storage FUSE to infer the simulated folder and the object within it, include the--implicit-dirs gcsfuse option or theimplicit-dirs configuration file field as part of your gcsfuse mountcommand when mounting a flat namespace bucket.

If you need to store and access your data using a file system, use buckets with hierarchical namespace enabled. To learn how to create such buckets, seeCreate buckets with hierarchical namespace enabled.

For more information about directory semantics, including how to mount buckets with implicitly-defined directories, seeFiles and directories in the Cloud Storage FUSE GitHub documentation.

Cloud Storage FUSE retry strategies

By default, failed requests from Cloud Storage FUSE to Cloud Storage areretried with exponential backoff up to a specified maximum backoff duration, which has a value of 30s (30 seconds) by default. Once the backoff duration exceeds the specified maximum duration, the retry continues with the specified maximum duration. You can use the--max-retry-sleep option or thegcs-retries:max-retry-sleep field as part of a gcsfuse mount call to specify the backoff duration.

Retry strategy for stalled GET or READ requests

When you perform a GET or READ request with Cloud Storage FUSE, a timeout period is applied. If the request exceeds the timeout period, Cloud Storage FUSE cancels the request and retries using an exponential backoff algorithm.

The timeout is dynamic and is based on the 99th percentile latency of past successful or canceled GET or READ requests, with a 1.5-second minimum. This ensures that only the slowest 1% of requests, those exceeding the 99th percentile historical latency, are retried.

Retry strategy for stalled uploads

Large file writes are uploaded in chunks. To help reduce tail end write latencies, if a chunk-level write operation stalls or fails, Cloud Storage FUSE attempts a retry after 10 seconds. A maximum of four retry operations are performed for each stalled chunk.

Cloud Storage FUSE operations associated with Cloud Storage operations

When you perform an operation using Cloud Storage FUSE, you also perform the Cloud Storage operations associated with the Cloud Storage FUSE operation. The following table describes common Cloud Storage FUSE commands and their associated Cloud Storage JSON API operations. You can display information about the Cloud Storage FUSE operations by setting the--log-severity option or thelogging:severity field to TRACE in your gcsfuse command.

Command JSON API Operations
gcsfuse --log-severity=TRACE example-bucket mp Objects.list (to check credentials)
cd mp n/a
ls mp Objects.list("")
mkdir subdir Objects.get("subdir")Objects.get("subdir/")Objects.insert("subdir/")
cp ~/local.txt subdir/ Objects.get("subdir/local.txt")Objects.get("subdir/local.txt/")Objects.insert("subdir/local.txt"), to create an empty objectObjects.insert("subdir/local.txt"), when closing after done writing
rm -rf subdir Objects.list("subdir")Objects.list("subdir/")Objects.delete("subdir/local.txt")Objects.list("subdir/")Objects.delete("subdir/")

Metrics

Cloud Storage offers in-depth metrics which can help you optimize Cloud Storage FUSE performance and costs. To learn more about metrics for Cloud Storage FUSE, see Cloud Storage FUSE Metrics.

Security

Cloud Storage FUSE applies Google Cloud's standard authentication throughApplication Default Credentials to identify users or service accounts. Access to the contents within buckets is governed by Identity and Access Managementpermissions. Local system permissions secure the mount point itself and any locally cached data.

Pricing for Cloud Storage FUSE

Cloud Storage FUSE is available free of charge, but the storage, metadata, and network I/O it generates to and from Cloud Storage are charged like any other Cloud Storage interface. In other words, all data transfer and operations performed by Cloud Storage FUSE map to Cloud Storage transfers and operations, and are charged accordingly. For more information on common Cloud Storage FUSE operations and how they map to Cloud Storage operations, see operations mapping.

To avoid surprises, you should estimate how your use of Cloud Storage FUSE translates to Cloud Storage charges. For example, if you are using Cloud Storage FUSE to store log files, you can incur charges quickly if logs are aggressively flushed on hundreds or thousands of machines at the same time.

See Cloud Storage pricing for information on charges such as storage, network usage, and operations.

Limitations

While Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the backend. Additionally, Cloud Storage FUSE is not POSIX compliant. For a POSIX file system product in Google Cloud, seeFilestore.

When using Cloud Storage FUSE, be aware of its limitations and semantics, which are different than that of POSIX file systems. Cloud Storage FUSE should only be used within its capabilities.

Limitations and differences from POSIX file systems

The following list describes the limitations of Cloud Storage FUSE:

Known issues

For a list of known issues in Cloud Storage FUSE, see theopen Cloud Storage FUSE issues in GitHub.

Get support

You can get support, submit general questions, and request new features by using one of Google Cloud's official support channels. You can also get support by filing issues in GitHub.

For solutions to commonly-encountered issues, seeTroubleshooting for production issues in the Cloud Storage FUSE GitHub documentation.

What's next