Storage Buckets · Hugging Face (original) (raw)
Storage Buckets are a repo type on the Hugging Face Hub providing S3-like object storage, powered by the Xet storage backend. Unlike Git-based repositories (models, datasets, Spaces), buckets are non-versioned and mutable, designed for use cases where you need simple, fast storage such as training checkpoints, logs, intermediate artifacts, or any large collection of files that doesn’t need version control.
You can interact with buckets using the Hub web interface, the hf CLI, or the Python API.
Buckets are available to all users and organizations. See hf.co/storage for pricing details.
See Access Patterns for how to reach bucket data from your tools (mount as a filesystem,
hf://paths, volume mounts in Jobs/Spaces), and Bucket Integrations for ready-to-use snippets in popular data libraries like pandas, Dask, and Spark.
Buckets vs Repositories
The Hub offers two types of storage: Git-based repositories for versioned, collaborative work and buckets for fast, mutable object storage.
| Feature | Repositories (Git-based) | Storage Buckets |
|---|---|---|
| Versioning | Full Git history | None (mutable, overwrite-in-place) |
| Types | Models, Datasets, Spaces | Standalone bucket |
| Primary use case | Publishing finished artifacts | Working storage / intermediate data |
| Operations | Hub API, Git push/pull | S3-like sync, cp, rm |
| Deduplication | Xet chunk-level | Xet chunk-level |
| Pull Requests | Yes | No |
| Model/Dataset Cards | Yes | No |
Use repositories when you want version history, collaboration features (PRs, discussions), and library integrations. Use buckets when you need fast, mutable storage for data that changes frequently — files can be overwritten or deleted in place.
Creating a Bucket
From the Hub UI
- Navigate to huggingface.co/new-bucket:

- Specify the owner of the bucket: this can be either you or any of the organizations you’re affiliated with.
- Enter a bucket name.
- Choose whether the bucket should be public or private.
- Optionally, preselect CDN pre-warming regions to cache your data closer to your compute from the start.
After creating the bucket, you should see the bucket page:

From the CLI
hf buckets create my-bucket
hf buckets create my-bucket --private
hf buckets create my-org/shared-bucket
From Python
from huggingface_hub import create_bucket
create_bucket("my-bucket")
create_bucket("my-bucket", private=True)
create_bucket("my-org/shared-bucket")
For the full Python API reference including deleting, moving, and listing buckets, see the huggingface_hub Buckets guide.
Browsing Buckets on the Hub
Every bucket has a page on the Hub where you can browse its contents, navigate directories, and view file details. Bucket pages are available at https://huggingface.co/buckets/<owner>/<bucket-name>.

You can also list bucket contents from the CLI:
hf buckets list julien-c/my-training-bucket -h Feb 17 14:46 art/ Feb 17 14:58 arxivqa/ Feb 17 15:02 arxivqa2/ Feb 17 15:04 arxivqa3/ Feb 17 14:47 captcha/ Feb 17 14:53 captcha2/ Feb 24 17:22 julien/
hf buckets list julien-c/my-training-bucket/art -h -R 423.6 MB Feb 17 14:29 art/train-00000-of-00011.parquet 441.0 MB Feb 17 14:29 art/train-00001-of-00011.parquet 521.7 MB Feb 17 14:29 art/train-00002-of-00011.parquet 481.4 MB Feb 17 14:29 art/train-00003-of-00011.parquet 444.6 MB Feb 17 14:29 art/train-00004-of-00011.parquet 461.6 MB Feb 17 14:29 art/train-00005-of-00011.parquet 466.4 MB Feb 17 14:29 art/train-00006-of-00011.parquet 486.3 MB Feb 17 14:29 art/train-00007-of-00011.parquet 477.0 MB Feb 17 14:29 art/train-00008-of-00011.parquet 454.0 MB Feb 17 14:29 art/train-00009-of-00011.parquet 483.1 MB Feb 17 14:29 art/train-00010-of-00011.parquet
hf buckets list julien-c/my-training-bucket --tree -h -R ├── art/ 423.6 MB Feb 17 14:29 │ ├── train-00000-of-00011.parquet 441.0 MB Feb 17 14:29 │ ├── train-00001-of-00011.parquet 521.7 MB Feb 17 14:29 │ ├── train-00002-of-00011.parquet 481.4 MB Feb 17 14:29 │ ├── train-00003-of-00011.parquet 444.6 MB Feb 17 14:29 │ ├── train-00004-of-00011.parquet 461.6 MB Feb 17 14:29 │ ├── train-00005-of-00011.parquet 466.4 MB Feb 17 14:29 │ ├── train-00006-of-00011.parquet 486.3 MB Feb 17 14:29 │ ├── train-00007-of-00011.parquet 477.0 MB Feb 17 14:29 │ ├── train-00008-of-00011.parquet 454.0 MB Feb 17 14:29 │ ├── train-00009-of-00011.parquet 483.1 MB Feb 17 14:29 │ └── train-00010-of-00011.parquet ├── arxivqa/ 495.9 MB Feb 17 14:32 │ ├── train-00000-of-00164.parquet 518.3 MB Feb 17 14:32 │ ├── train-00001-of-00164.parquet 495.5 MB Feb 17 14:32 │ ├── train-00002-of-00164.parquet 486.6 MB Feb 17 14:32 │ ├── train-00003-of-00164.parquet 490.4 MB Feb 17 14:32 │ ├── train-00004-of-00164.parquet ...
Managing Files
You can upload and download files directly from the bucket page on the Hub, or use the CLI and Python API for programmatic access. Bucket files are referenced using hf://buckets/ paths (e.g., hf://buckets/username/my-bucket/path/to/file). The hf buckets cp command handles individual file transfers while hf buckets sync is better suited for directories. All commands work in both directions — local-to-remote and remote-to-local.
Uploading files
For quick uploads, you can drag and drop files directly on the bucket page in your browser. For programmatic use, hf buckets cp copies individual files into a bucket. The source is a local path and the destination is an hf://buckets/ path. You can also pipe data from stdin, which is handy for programmatically generated content.
CLI:
hf buckets cp ./model.safetensors hf://buckets/username/my-bucket/models/model.safetensors
cat config.json | hf buckets cp - hf://buckets/username/my-bucket/config.json
In Python, use batch_bucket_files to upload one or more files in a single call. Each entry is a tuple of (local_path, remote_path).
Python:
from huggingface_hub import batch_bucket_files
batch_bucket_files( "username/my-bucket", add=[ ("./model.safetensors", "models/model.safetensors"), ("./config.json", "models/config.json"), ], )
For more upload options (raw bytes, combined upload+delete, etc.), see the huggingface_hub upload guide.
Downloading files
You can download individual files directly from the bucket page on the Hub by clicking on them. For programmatic access, downloading mirrors the upload syntax — swap the source and destination in hf buckets cp. You can also stream a file to stdout by using - as the destination, which lets you pipe bucket contents directly into other tools.
CLI:
hf buckets cp hf://buckets/username/my-bucket/models/model.safetensors ./model.safetensors
hf buckets cp hf://buckets/username/my-bucket/config.json - | jq .
In Python, use download_bucket_files with a list of (remote_path, local_path) tuples.
Python:
from huggingface_hub import download_bucket_files
download_bucket_files( "username/my-bucket", files=[ ("models/model.safetensors", "./local/model.safetensors"), ("config.json", "./local/config.json"), ], )
For faster downloads using pre-fetched metadata, see the huggingface_hub download guide.
Syncing directories
The sync command works like rsync or aws s3 sync — it compares source and destination and only transfers files that have changed. This is the most efficient way to keep a local directory and a bucket in sync. By default, sync only adds and updates files. Pass --delete to also remove files at the destination that no longer exist at the source. Use --dry-run to preview what would happen without actually transferring anything.
CLI:
hf buckets sync ./data hf://buckets/username/my-bucket/data
hf buckets sync hf://buckets/username/my-bucket/data ./data
hf buckets sync ./data hf://buckets/username/my-bucket/data --delete
hf buckets sync ./data hf://buckets/username/my-bucket/data --dry-run
hf buckets sync ./data hf://buckets/username/my-bucket/data --plan sync-plan.jsonl
hf buckets sync --apply sync-plan.jsonl
hf syncis a convenient alias forhf buckets sync.
Python:
from huggingface_hub import sync_bucket
sync_bucket("./data", "hf://buckets/username/my-bucket/data")
sync_bucket("hf://buckets/username/my-bucket/data", "./data")
The sync command supports filtering (--include, --exclude), comparison modes (--ignore-times, --existing), and a plan-and-apply workflow to review operations before executing them. For the full set of options, see the huggingface_hub sync guide.
Deleting files
Since buckets are non-versioned, deletions are immediate and permanent — there is no way to recover a deleted file. Use --dry-run to double-check before removing files, especially when using --recursive.
CLI:
hf buckets rm username/my-bucket/old-model.bin
hf buckets rm username/my-bucket/logs/ --recursive
hf buckets rm username/my-bucket/checkpoints/ --recursive --dry-run
Python:
from huggingface_hub import batch_bucket_files
batch_bucket_files("username/my-bucket", delete=["old-model.bin", "logs/debug.log"])
For more deletion options (pattern-based filtering, recursive removal, etc.), see the huggingface_hub delete guide.
Copying files between repos and buckets
You can copy Xet-tracked files from any repository (model, dataset, Space) or bucket into a destination bucket without re-uploading the data. The copy is server-side: only the Xet content hashes are migrated, so even very large files are copied instantly.
Only Xet-tracked files are copied server-to-server. Small non-Xet files (e.g., config files and READMEs) are automatically downloaded and re-uploaded.
CLI:
hf buckets cp
hf://datasets/HuggingFaceFW/fineweb/data
hf://buckets/username/fineweb-data
Python:
from huggingface_hub import HfApi
api = HfApi()
api.copy_files( "hf://datasets/HuggingFaceFW/fineweb/data", "hf://buckets/username/fineweb-data", )
You need read access to the source repository or bucket and write access to the destination bucket.
Note that transferring data the other way from a bucket to a repository (model, dataset, Space) without reuploading is not yet available, but is on the roadmap.
Pre-warming and CDN
Buckets live on the Hub’s global storage by default. For workloads where storage location directly affects throughput you can pre-warm bucket data to bring it closer to your compute.
Pre-warming caches files at edge locations near specific cloud providers and regions, so your jobs read data locally instead of pulling it across regions. This is especially useful for:
- Training clusters that need fast access to large datasets or checkpoints
- Multi-region setups where different parts of a pipeline run in different clouds
- Distributing large artifacts to many consumers worldwide
See hf.co/storage for available regions and details on enabling pre-warming.
Use Cases
Training checkpoints and logs
When running training jobs (e.g., via Jobs), save checkpoints and logs to a bucket. Unlike a Git repo, you can overwrite the latest checkpoint without accumulating version history, and sync ensures only changed data is transferred.
hf sync ./checkpoints hf://buckets/my-org/training-run-42/checkpoints
Because buckets are built on Xet, successive checkpoints where large parts of the model are frozen benefit from chunk-level deduplication. Only the changed chunks are uploaded.
Data processing pipelines
Buckets serve as staging areas for data processing workflows. Process raw data, write intermediate outputs to a bucket, then promote the final artifact to a versioned Dataset repository when the pipeline completes. This keeps your versioned repo clean while giving your pipeline fast mutable storage.
Note that transferring data from a Bucket to a repository without reuploading is not yet available, but is on the roadmap.
Agentic storage
AI agents need scratch storage for intermediate results, tool outputs, traces, and working memory. Buckets provide a Hub-native place for this data: fast mutable access without Git overhead, standard Hugging Face permissions, and addressable via hf://buckets/ paths across the Hub ecosystem.
Rolling backups
Buckets are well-suited for maintaining rolling backups. With a Git-based Dataset repository, deleting outdated files doesn’t free storage — Git history retains every past version, so you’d need to squash commits or rewrite history to actually reclaim space. With buckets, old files are truly gone once deleted, and you only pay for what’s currently stored.
hf sync ./daily-backup hf://buckets/my-user/backups/latest --delete
Linking models to buckets
You can create a two-way link between a model and a bucket by adding the buckets field to the model card metadata. The linked models will then appear on the bucket page, and the bucket will appear as a tag on the model page.
buckets:
- my-org/my-bucket
See Specifying a bucket in the model cards documentation for more details.
Pricing
Storage Buckets are billed based on the amount of data stored, with simple per-TB pricing. Enterprise plans benefit from dedup-based billing, where shared chunks across files directly reduce the billed footprint.
As for other repositories, buckets are free to create and have a free storage allowance. For usage above the free tier, see hf.co/storage. For general billing information, see the Billing documentation.