fusemanager: fix container fail after ttl timeout in detach mode by wswsmao · Pull Request #1905 · containerd/stargz-snapshotter (original) (raw)

In detach mode, when containerd-stargz-grpc exits normally, it sends an Unmount request to the fuse manager. Additionally, during the startup of containerd-stargz-grpc, the restoreRemoteSnapshot function cleans up previous mountpoints. If there are still running containers at this time, it can lead to issues when the TTL cache expires, resulting in abnormal behavior of the containers.

I considered several solutions:

  1. The approach in the current PR, where users should restart containerd-stargz-grpc using SIGKILL, and then skip the cleanup step in restoreRemoteSnapshot.
  2. Setting ResolveResultEntryTTLSec to an infinitely large value to leverage the TTL cache for ensuring the normal operation of containers. However, this would still lead to failures if the containers attempt to access uncached content.
  3. Implementing a complex mechanism to determine if any running containers are using the mountpoints, and if so, skipping the cleanup.

After careful consideration, I have decided to proceed with the first approach.