Optimize cache usage in builds (original) (raw)

When building with Docker, a layer is reused from the build cache if the instruction and the files it depends on hasn't changed since it was previously built. Reusing layers from the cache speeds up the build process because Docker doesn't have to rebuild the layer again.

Here are a few techniques you can use to optimize build caching and speed up the build process:

Order your layers: Putting the commands in your Dockerfile into a logical order can help you avoid unnecessary cache invalidation.
Keep the context small: The context is the set of files and directories that are sent to the builder to process a build instruction. Keeping the context as small as possible reduces the amount of data that needs to be sent to the builder, and reduces the likelihood of cache invalidation.
Use bind mounts: Bind mounts let you mount a file or directory from the host machine into the build container. Using bind mounts can help you avoid unnecessary layers in the image, which can slow down the build process.
Use cache mounts: Cache mounts let you specify a persistent package cache to be used during builds. The persistent cache helps speed up build steps, especially steps that involve installing packages using a package manager. Having a persistent cache for packages means that even if you rebuild a layer, you only download new or changed packages.
Use an external cache: An external cache lets you store build cache at a remote location. The external cache image can be shared between multiple builds, and across different environments.

Putting the commands in your Dockerfile into a logical order is a great place to start. Because a change causes a rebuild for steps that follow, try to make expensive steps appear near the beginning of the Dockerfile. Steps that change often should appear near the end of the Dockerfile, to avoid triggering rebuilds of layers that haven't changed.

Consider the following example. A Dockerfile snippet that runs a JavaScript build from the source files in the current directory:

This Dockerfile is rather inefficient. Updating any file causes a reinstall of all dependencies every time you build the Docker image even if the dependencies didn't change since last time.

Instead, the COPY command can be split in two. First, copy over the package management files (in this case, package.json and yarn.lock). Then, install the dependencies. Finally, copy over the project source code, which is subject to frequent change.

By installing dependencies in earlier layers of the Dockerfile, there is no need to rebuild those layers when a project file has changed.

Keep the context small

The easiest way to make sure your context doesn't include unnecessary files is to create a .dockerignore file in the root of your build context. The.dockerignore file works similarly to .gitignore files, and lets you exclude files and directories from the build context.

Here's an example .dockerignore file that excludes the node_modulesdirectory, all files and directories that start with tmp:

Ignore-rules specified in the .dockerignore file apply to the entire build context, including subdirectories. This means it's a rather coarse-grained mechanism, but it's a good way to exclude files and directories that you know you don't need in the build context, such as temporary files, log files, and build artifacts.

You might be familiar with bind mounts for when you run containers with docker run or Docker Compose. Bind mounts let you mount a file or directory from the host machine into a container.

To use bind mounts in a build, you can use the --mount flag with the RUNinstruction in your Dockerfile:

In this example, the current directory is mounted into the build container before the go build command gets executed. The source code is available in the build container for the duration of that RUN instruction. When the instruction is done executing, the mounted files are not persisted in the final image, or in the build cache. Only the output of the go build command remains.

The COPY and ADD instructions in a Dockerfile lets you copy files from the build context into the build container. Using bind mounts is beneficial for build cache optimization because you're not adding unnecessary layers to the cache. If you have build context that's on the larger side, and it's only used to generate an artifact, you're better off using bind mounts to temporarily mount the source code required to generate the artifact into the build. If you use COPY to add the files to the build container, BuildKit will include all of those files in the cache, even if the files aren't used in the final image.

There are a few things to be aware of when using bind mounts in a build:

Bind mounts are read-only by default. If you need to write to the mounted directory, you need to specify the rw option. However, even with the rwoption, the changes are not persisted in the final image or the build cache. The file writes are sustained for the duration of the RUN instruction, and are discarded after the instruction is done.
Mounted files are not persisted in the final image. Only the output of theRUN instruction is persisted in the final image. If you need to include files from the build context in the final image, you need to use the COPYor ADD instructions.
If the target directory is not empty, the contents of the target directory are hidden by the mounted files. The original contents are restored after theRUN instruction is done.
For example, given a build context with only a Dockerfile in it:
And a Dockerfile that mounts the current directory into the build container:
The first ls command with the bind mount shows the contents of the mounted directory. The second ls lists the contents of the original build context.

Regular cache layers in Docker correspond to an exact match of the instruction and the files it depends on. If the instruction and the files it depends on have changed since the layer was built, the layer is invalidated, and the build process has to rebuild the layer.

Cache mounts are a way to specify a persistent cache location to be used during builds. The cache is cumulative across builds, so you can read and write to the cache multiple times. This persistent caching means that even if you need to rebuild a layer, you only download new or changed packages. Any unchanged packages are reused from the cache mount.

To use cache mounts in a build, you can use the --mount flag with the RUNinstruction in your Dockerfile:

In this example, the npm install command uses a cache mount for the/root/.npm directory, the default location for the npm cache. The cache mount is persisted across builds, so even if you end up rebuilding the layer, you only download new or changed packages. Any changes to the cache are persisted across builds, and the cache is shared between multiple builds.

How you specify cache mounts depends on the build tool you're using. If you're unsure how to specify cache mounts, refer to the documentation for the build tool you're using. Here are a few examples:

It's important that you read the documentation for the build tool you're using to make sure you're using the correct cache mount options. Package managers have different requirements for how they use the cache, and using the wrong options can lead to unexpected behavior. For example, Apt needs exclusive access to its data, so the caches use the option sharing=locked to ensure parallel builds using the same cache mount wait for each other and not access the same cache files at the same time.

The default cache storage for builds is internal to the builder (BuildKit instance) you're using. Each builder uses its own cache storage. When you switch between different builders, the cache is not shared between them. Using an external cache lets you define a remote location for pushing and pulling cache data.

External caches are especially useful for CI/CD pipelines, where the builders are often ephemeral, and build minutes are precious. Reusing the cache between builds can drastically speed up the build process and reduce cost. You can even make use of the same cache in your local development environment.

To use an external cache, you specify the --cache-to and --cache-fromoptions with the docker buildx build command.

--cache-to exports the build cache to the specified location.
--cache-from specifies remote caches for the build to use.

The following example shows how to set up a GitHub Actions workflow usingdocker/build-push-action, and push the build cache layers to an OCI registry image:

This setup tells BuildKit to look for cache in the user/app:buildcache image. And when the build is done, the new build cache is pushed to the same image, overwriting the old cache.

This cache can be used locally as well. To pull the cache in a local build, you can use the --cache-from option with the docker buildx build command:

Optimizing cache usage in builds can significantly speed up the build process. Keeping the build context small, using bind mounts, cache mounts, and external caches are all techniques you can use to make the most of the build cache and speed up the build process.

For more information about the concepts discussed in this guide, see: