LLVM: lib/CAS/UnifiedOnDiskCache.cpp File Reference (original) (raw)

Encapsulates [OnDiskGraphDB](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1OnDiskGraphDB.html "On-disk CAS nodes database, independent of a particular hashing algorithm.") and [OnDiskKeyValueDB](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1OnDiskKeyValueDB.html "An on-disk key-value data store with the following properties:") instances within one directory while also restricting storage growth with a scheme of chaining the two most recent directories (primary & upstream), where the primary "faults-in" data from the upstream one. More...

Encapsulates [OnDiskGraphDB](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1OnDiskGraphDB.html "On-disk CAS nodes database, independent of a particular hashing algorithm.") and [OnDiskKeyValueDB](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1OnDiskKeyValueDB.html "An on-disk key-value data store with the following properties:") instances within one directory while also restricting storage growth with a scheme of chaining the two most recent directories (primary & upstream), where the primary "faults-in" data from the upstream one.

When the primary (most recent) directory exceeds its intended limit a new empty directory becomes the primary one.

Within the top-level directory (the path that [UnifiedOnDiskCache::open](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html#a3c6574a638e6517a1dcac9fd56ab633e "Open a UnifiedOnDiskCache instance for a directory.") receives) there are directories named like this:

'v.' 'v.<x+1>' 'v.<x+2>' ...

'version' is the version integer for this [UnifiedOnDiskCache](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html "A unified CAS nodes and key-value database, using on-disk storage for both.")'s scheme and the part after the dot is an increasing integer. The primary directory is the one with the highest integer and the upstream one is the directory before it. For example, if the sub-directories contained are:

'v1.5', 'v1.6', 'v1.7', 'v1.8'

Then the primary one is 'v1.8', the upstream one is 'v1.7', and the rest are unused directories that can be safely deleted at any time and by any process.

Contained within the top-level directory is a file named "lock" which is used for processes to take shared or exclusive locks for the contents of the top directory. While a [UnifiedOnDiskCache](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html "A unified CAS nodes and key-value database, using on-disk storage for both.") is open it keeps a shared lock for the top-level directory; when it closes, if the primary sub-directory exceeded its limit, it attempts to get an exclusive lock in order to create a new empty primary directory; if it can't get the exclusive lock it gives up and lets the next [UnifiedOnDiskCache](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html "A unified CAS nodes and key-value database, using on-disk storage for both.") instance that closes to attempt again.

The downside of this scheme is that while [UnifiedOnDiskCache](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html "A unified CAS nodes and key-value database, using on-disk storage for both.") is open on a directory, by any process, the storage size in that directory will keep growing unrestricted. But the major benefit is that garbage-collection can be triggered on a directory concurrently, at any time and by any process, without affecting any active readers/writers in the same process or other processes.

The [UnifiedOnDiskCache](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html "A unified CAS nodes and key-value database, using on-disk storage for both.") also provides validation and recovery on top of the underlying on-disk storage. The low-level storage is designed to remain coherent across regular process crashes, but may be invalid after power loss or similar system failures. [UnifiedOnDiskCache::validateIfNeeded](classllvm%5F1%5F1cas%5F1%5F1ondisk%5F1%5F1UnifiedOnDiskCache.html#ad0a9b170b9368739c16e82d0abad21c8 "Validate the data in Path, if needed to ensure correctness.") allows validating the contents once per boot and can recover by marking invalid data for garbage collection.

The data recovery described above requires exclusive access to the CAS, and it is an error to attempt recovery if the CAS is open in any process/thread. In order to maximize backwards compatibility with tools that do not perform validation before opening the CAS, we do not attempt to get exclusive access until recovery is actually performed, meaning as long as the data is valid it will not conflict with concurrent use.

Definition in file UnifiedOnDiskCache.cpp.