Optimize cache usage in builds (original) (raw)

When building with Docker, a layer is reused from the build cache if the instruction and the files it depends on hasn't changed since it was previously built. Reusing layers from the cache speeds up the build process because Docker doesn't have to rebuild the layer again.

Here are a few techniques you can use to optimize build caching and speed up the build process:

Putting the commands in your Dockerfile into a logical order is a great place to start. Because a change causes a rebuild for steps that follow, try to make expensive steps appear near the beginning of the Dockerfile. Steps that change often should appear near the end of the Dockerfile, to avoid triggering rebuilds of layers that haven't changed.

Consider the following example. A Dockerfile snippet that runs a JavaScript build from the source files in the current directory:

This Dockerfile is rather inefficient. Updating any file causes a reinstall of all dependencies every time you build the Docker image even if the dependencies didn't change since last time.

Instead, the COPY command can be split in two. First, copy over the package management files (in this case, package.json and yarn.lock). Then, install the dependencies. Finally, copy over the project source code, which is subject to frequent change.

By installing dependencies in earlier layers of the Dockerfile, there is no need to rebuild those layers when a project file has changed.

Keep the context small

The easiest way to make sure your context doesn't include unnecessary files is to create a .dockerignore file in the root of your build context. The.dockerignore file works similarly to .gitignore files, and lets you exclude files and directories from the build context.

Here's an example .dockerignore file that excludes the node_modulesdirectory, all files and directories that start with tmp:

Ignore-rules specified in the .dockerignore file apply to the entire build context, including subdirectories. This means it's a rather coarse-grained mechanism, but it's a good way to exclude files and directories that you know you don't need in the build context, such as temporary files, log files, and build artifacts.

You might be familiar with bind mounts for when you run containers with docker run or Docker Compose. Bind mounts let you mount a file or directory from the host machine into a container.

To use bind mounts in a build, you can use the --mount flag with the RUNinstruction in your Dockerfile:

In this example, the current directory is mounted into the build container before the go build command gets executed. The source code is available in the build container for the duration of that RUN instruction. When the instruction is done executing, the mounted files are not persisted in the final image, or in the build cache. Only the output of the go build command remains.

The COPY and ADD instructions in a Dockerfile lets you copy files from the build context into the build container. Using bind mounts is beneficial for build cache optimization because you're not adding unnecessary layers to the cache. If you have build context that's on the larger side, and it's only used to generate an artifact, you're better off using bind mounts to temporarily mount the source code required to generate the artifact into the build. If you use COPY to add the files to the build container, BuildKit will include all of those files in the cache, even if the files aren't used in the final image.

There are a few things to be aware of when using bind mounts in a build:

Regular cache layers in Docker correspond to an exact match of the instruction and the files it depends on. If the instruction and the files it depends on have changed since the layer was built, the layer is invalidated, and the build process has to rebuild the layer.

Cache mounts are a way to specify a persistent cache location to be used during builds. The cache is cumulative across builds, so you can read and write to the cache multiple times. This persistent caching means that even if you need to rebuild a layer, you only download new or changed packages. Any unchanged packages are reused from the cache mount.

To use cache mounts in a build, you can use the --mount flag with the RUNinstruction in your Dockerfile:

In this example, the npm install command uses a cache mount for the/root/.npm directory, the default location for the npm cache. The cache mount is persisted across builds, so even if you end up rebuilding the layer, you only download new or changed packages. Any changes to the cache are persisted across builds, and the cache is shared between multiple builds.

How you specify cache mounts depends on the build tool you're using. If you're unsure how to specify cache mounts, refer to the documentation for the build tool you're using. Here are a few examples:

It's important that you read the documentation for the build tool you're using to make sure you're using the correct cache mount options. Package managers have different requirements for how they use the cache, and using the wrong options can lead to unexpected behavior. For example, Apt needs exclusive access to its data, so the caches use the option sharing=locked to ensure parallel builds using the same cache mount wait for each other and not access the same cache files at the same time.

The default cache storage for builds is internal to the builder (BuildKit instance) you're using. Each builder uses its own cache storage. When you switch between different builders, the cache is not shared between them. Using an external cache lets you define a remote location for pushing and pulling cache data.

External caches are especially useful for CI/CD pipelines, where the builders are often ephemeral, and build minutes are precious. Reusing the cache between builds can drastically speed up the build process and reduce cost. You can even make use of the same cache in your local development environment.

To use an external cache, you specify the --cache-to and --cache-fromoptions with the docker buildx build command.

The following example shows how to set up a GitHub Actions workflow usingdocker/build-push-action, and push the build cache layers to an OCI registry image:

This setup tells BuildKit to look for cache in the user/app:buildcache image. And when the build is done, the new build cache is pushed to the same image, overwriting the old cache.

This cache can be used locally as well. To pull the cache in a local build, you can use the --cache-from option with the docker buildx build command:

Optimizing cache usage in builds can significantly speed up the build process. Keeping the build context small, using bind mounts, cache mounts, and external caches are all techniques you can use to make the most of the build cache and speed up the build process.

For more information about the concepts discussed in this guide, see: