ZFS storage driver (original) (raw)

ZFS is a next generation filesystem that supports many advanced storage technologies such as volume management, snapshots, checksumming, compression and deduplication, replication and more.

It was created by Sun Microsystems (now Oracle Corporation) and is open sourced under the CDDL license. Due to licensing incompatibilities between the CDDL and GPL, ZFS cannot be shipped as part of the mainline Linux kernel. However, the ZFS On Linux (ZoL) project provides an out-of-tree kernel module and userspace tools which can be installed separately.

The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in time it is not recommended to use the zfs Docker storage driver for production use unless you have substantial experience with ZFS on Linux.

Note

There is also a FUSE implementation of ZFS on the Linux platform. This is not recommended. The native ZFS driver (ZoL) is more tested, has better performance, and is more widely used. The remainder of this document refers to the native ZoL port.

Note

There is no need to use MountFlags=slave because dockerd and containerd are in different mount namespaces.

  1. Stop Docker.
  2. Copy the contents of /var/lib/docker/ to /var/lib/docker.bk and remove the contents of /var/lib/docker/.
  3. Create a new zpool on your dedicated block device or devices, and mount it into /var/lib/docker/. Be sure you have specified the correct devices, because this is a destructive operation. This example adds two devices to the pool.
    The command creates the zpool and names it zpool-docker. The name is for display purposes only, and you can use a different name. Check that the pool was created and mounted correctly using zfs list.
  4. Configure Docker to use zfs. Edit /etc/docker/daemon.json and set thestorage-driver to zfs. If the file was empty before, it should now look like this:
    Save and close the file.
  5. Start Docker. Use docker info to verify that the storage driver is zfs.

Increase capacity on a running device

To increase the size of the zpool, you need to add a dedicated block device to the Docker host, and then add it to the zpool using the zpool add command:

Limit a container's writable storage quota

If you want to implement a quota on a per-image/dataset basis, you can set thesize storage option to limit the amount of space a single container can use for its writable layer.

Edit /etc/docker/daemon.json and add the following:

See all storage options for each storage driver in thedaemon reference documentation

Save and close the file, and restart Docker.

ZFS uses the following objects:

The process of creating a clone:

ZFS snapshots and clones

ZFS snapshots and clones

  1. A read-only snapshot is created from the filesystem.
  2. A writable clone is created from the snapshot. This contains any differences from the parent layer.

Filesystems, snapshots, and clones all allocate space from the underlyingzpool.

Image and container layers on-disk

Each running container's unified filesystem is mounted on a mount point in/var/lib/docker/zfs/graph/. Continue reading for an explanation of how the unified filesystem is composed.

Image layering and sharing

The base layer of an image is a ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top layer of the image it's created from.

The diagram below shows how this is put together with a running container based on a two-layer image.

ZFS pool for Docker container

ZFS pool for Docker container

When you start a container, the following steps happen in order:

  1. The base layer of the image exists on the Docker host as a ZFS filesystem.
  2. Additional image layers are clones of the dataset hosting the image layer directly below it.
    In the diagram, "Layer 1" is added by taking a ZFS snapshot of the base layer and then creating a clone from that snapshot. The clone is writable and consumes space on-demand from the zpool. The snapshot is read-only, maintaining the base layer as an immutable object.
  3. When the container is launched, a writable layer is added above the image.
    In the diagram, the container's read-write layer is created by making a snapshot of the top layer of the image (Layer 1) and creating a clone from that snapshot.
  4. As the container modifies the contents of its writable layer, space is allocated for the blocks that are changed. By default, these blocks are 128k.

Reading files

Each container's writable layer is a ZFS clone which shares all its data with the dataset it was created from (the snapshots of its parent layers). Read operations are fast, even if the data being read is from a deep layer. This diagram illustrates how block sharing works:

ZFS block sharing

ZFS block sharing

Writing files

Writing a new file: space is allocated on demand from the underlying zpooland the blocks are written directly into the container's writable layer.

Modifying an existing file: space is allocated only for the changed blocks, and those blocks are written into the container's writable layer using a copy-on-write (CoW) strategy. This minimizes the size of the layer and increases write performance.

Deleting a file or directory:

There are several factors that influence the performance of Docker using thezfs storage driver.

Performance best practices