How persistent container storage works -- and why it matters (original) (raw)

Persistent storage retains data when a device is shut off. And for containers -- inherently stateless and ephemeral -- this type of storage is critical for deployments.

Enterprises seeking resilient, scalable container deployments must get a handle on persistent storage.

A database is persistent and independent data storage that one or many applications access and update. Other storage, however, takes the form of temporary information repositories, such as scratch space for user data. This storage disappears when the application isn't running -- and that can pose problems.

An application that always runs on a single host accesses local disk storage for the information it needs while operating. These storage volumes are both logically and physically persistent. Dynamic and elastic container deployments, by contrast, separate logical and physical states of storage for applications.

A containerized application can be logically resident but physically transient, because of redeployment and scaling capabilities inherent in container technologies such as Docker. For example, a container resides on a host, but if that container stops working, the container manager can start up a new instance on another host. The application might need some data continuously available when it's running, which is logical persistence. Since container information is, by default, ephemeral, storage is physically transient.

A lot of information is transient and application-specific, such as state or context data for its operation. Storage can also be used for parametric information, configuration data and even development and repository tool or pipeline data, where this information isn't part of the container deployment process itself.

For true stateless container operation, the application must be designed to externally store this transient data associated with process state and context. Any instance of that containerized application component, running anywhere, can recover that data.

Bind mounts and volumes

There are two accepted ways to add persistent storage to containers: bind mounts and volumes.

Bind mounts create a mapping, which is the binding, between the container file space and the file space of the local system. The bind-mount mechanism is specific to the file system semantics. There might be differences between Linux and Windows, or other container platforms such as D2iQ, because of this file system-based approach. In addition, bind mounts won't work if a container fails or closes down and a new instance deploys on a different server. For these reasons, bind mounts are deprecated even for basic container applications. Don't use this approach for microservices and cloud-native container deployments.

The volume, also called named volume, strategy is somewhat similar in approach to bind mounts. A named volume is mapped to a specific path, either on the local machine or a different one. Any container with access to the named volume's location mapping can reference it. In this way, volumes support ephemeral container operations, as containers scale up or redeploy on different physical resources. Every instance of that container can access the same named volume to synchronize with the logical state for that containerized code.

Kubernetes CSI

Kubernetes container management relies on the Container Storage Interface (CSI). The Kubernetes CSI provides a model to link a containerized application to various storage options. CSI persistent storage is based on persistent volumes (PVs), a variation of the named-volume approach. PVs are cluster resources that supply storage to pods. The storage persists even when the pods are transient.

There are two accepted ways to add persistent storage to containers: bind mounts and volumes.

Kubernetes users can request a PV on demand or assign specific PVs to be relatively permanent storage resources. This flexibility means that PVs can be used for any container storage needs, even if it only involves a single container. Dynamic persistent volume provisioning is managed via the StorageClass resource. Using StorageClass, administrators can define storage resources for Kubernetes.

Users access persistent volumes via a claim. The PersistentVolumeClaim (PVC) essentially creates a separate pod for PV access. The setup is akin to how an application accesses a remote database -- the database is a separate pod from the logic. The claim is an abstract volume of storage.

Without proper management, Kubernetes PVs and PVCs can get out of hand. You can delete a PVC object when it's no longer required, but the PV can't be reused as long as the user data remains there. There are various mechanisms to clean up persistent volumes so they're ready for release. Follow the process dictated by the container platform or cloud service you use.

Persistent storage tools for containers

Persistent storage management is important for real-world container use, particularly if the deployment needs resilience and cloud scalability. Persistent storage tools provide storage as a service to these containers, through distinct approaches. Consider these tools for hybrid and multi-cloud deployments, where containers span more than one cloud hosting environment.

One approach makes persistent storage appear as a container-attached storage remote file system resource. Examples include OpenEBS and Rancher Longhorn, both open source technologies. An abstract SCSI interface gives containers access to the OpenEBS pool of storage resources. There are a few benefits to this model. The storage volumes look like containers, and work with container backup and orchestration tools. The approach resembles Amazon Elastic Block Store, which makes it easy to adopt for applications designed for AWS.

Another class of persistent storage tools for containers is OpenStorage SDK-based plugins that adapt the Kubernetes CSI API set -- including PV and PVC. OpenStorage is a multihost clustered way to manage and provision storage volumes, based on the Open Storage specification. Commercial offerings include those from Portworx, StorPool Storage and StorageOS. This class of tool is directly linked with the Kubernetes CSI model. Organizations with a pure Kubernetes storage approach might prefer this kind of tool over other options.

Companies such as Pure Storage are developing tools that orchestrate storage deployments for stateful container applications, making storage provisioning easier.

Containerized storage in general, and the Kubernetes CSI plugin model in particular, is on the rise. Overall, the trend is to integrate persistent storage with ephemeral containers in the best way possible. The ability to pair these technologies is critical to cloud-native deployments.

Dig Deeper on Containers and virtualization