HybridCache - tags and invalidation (original) (raw)

Background and Motivation

This is part of Epic: IDistributedCache updates in .NET 9

The first wave (preview 4) delivers the basic infrastructure for multi-tier caching based on IMemoryCache (L1) and IDistributedCache (L2), but it is intended to add 3 additional features that require L2 support:

active key invalidation
active tag invalidation
tag metadata lookup (for cold-start invalidation)

These features must be optional and implemented in a way that does not place fundamentally new demands on IDistributedCache - for example, the existing "set" API is simply an opaque string key and BLOB value. In particular, because we're talking about out-of-process data, it is not possible to use IChangeToken for this purpose (although a consuming layer could choose to use IChangeToken locally as part of responding to the themes of this proposal.

Active key invalidation

Right now, invalidation occurs passively via time/expiration, or as write-thru side-effects from operations on the same node; in a multi-node scenario this is insufficient and does not account for subsequent update/delete by other nodes, which can lead to inconsistent L1 cache state.

To remedy this, the following event is proposed on a new interface:

namespace Microsoft.Extensions.Caching.Distributed;

public delegate void DistributedCacheKeyInvalidation(string key, ReadOnlySpan header);

public interface IDistributedCacheInvalidation : IDistributedCache { int HeaderBytes { get; set; } event DistributedCacheKeyInvalidation KeyInvalidated; // addition here from 2/3; will be combined in summary }

The idea behind this API is that L2 backends may, through some mechanism (queuing, pub/sub, etc) broadcast key updates. When a node performs writes (Set[Async](...)/Remove[Async](...), it should (via that implementation-specific mechanism) also publish a global invalidation entry. In the case of Set[Async], the first HeaderBytes bytes of the payload may optionally also be published. This invalidation mechanism will be used to invoke KeyInvalidated at arbitrary times, for out-of-band invalidation notification. Emphasis: HeaderBytes is set by the consumer (HybridCache, etc), and indicates "if you're publishing: publish this much of the payload". The value may be zero, and/or the implementation may choose to ignore this and not include any payload metadata, just publishing the keys.

Note that the use of ReadOnlySpan<byte> precludes the use of Action<string, ...>; this use was intentional, making the lifetime semantics of header very clear - if it is needed beyond right now, the consumer must copy the value to somewhere they control. The name DistributedCacheKeyInvalidation is perhaps questionable.

The intent here is that HybridCache (or other consumers) can subscribe to KeyInvalidated, and respond accordingly. The key here matches the same string as described by string key in Set[Async] etc, noting that if the L2 has some configured namespace prefix, the L2 implementation is responsible for removing that key again, such that the key in KeyInvalidated is the original key transmitted.

The purpose of the header is to help avoid trivial removals. If the header received is empty, the invocation should be treated as a blind "delete", causing L1 removal. This is a fair default, but it is not assumed that implementations can automatically detect and avoid same-connection notifications, which means we must anticipate and avoid:

(normal) node X updates key ZZZ to value ABC in L1 and L2
(normal) the invalidation event {ZZZ, ABC} is published
(normal) the KeyInvalidated event on node X gets invoked with {ZZZ, ABC} for the same thing we just caused
(problem) node X removes L1 entry ZZZ (unnecessarily)
(problem) node X now gets cache miss on ZZZ and hits L1 again (unnecessarily)

The header allows us to avoid these last two steps; the implementation of this is consumer-dependent, but in the case of HybridCache, the payload sent to L2 will include a payload header that includes the creation timestamp and a disambiguation qualifier (which, along with the creation timestamp, essentially work like an ETag); by parsing these (which will be in the first few bytes):

if no corresponding L1 entry is found: there is nothing to do
if there is no header, or any unexpected header size: blind delete from L1
if the incoming creation timestamp is less than the L1 entry: do nothing (our data is considered fresher)
if the incoming creation timestamp and disambiguation qualifier both match the L1 entry: do nothing (we're being told about our own update)
otherwise: delete from L1

This provides a mechanism to communicate L2 invalidation to L1, and respond without causing self-invalidation, within the constraints of the data available to IDistributedCache.

Tagging

Tagging is a new concept being introduced into HybridCache that does not historically exist in IDistributedCache. To achieve this, the L2 tag metadata will also be stored as part of the header (although not typically in the bytes published for KeyInvalidated).

It is assumed that IDistributedCache cannot reliably implement cascading delete at the backend - this is simply not a feature in many key/value stores, and while it can be hacked in: it is usually unsatisfying and requires significant additional overhead. We want to avoid this complexity in the backend.

Consequently, HybridCache must implement this internally, by maintaining a lookup of each tag to the last known invalidation date, for example we might have (using numbers instead of dates here for simplicity):

tag "north"; invalidation date: 513
tag "offers"; invalidation date: 400
tag "east"; invalidation date: 234

When loading an entry from L1 or L2, if that entry has tags we must compare the creation date of the cache entry (again, from the payload header) to the dates in each of the tags; if any tag has an invalidation date greater than the cache entry's creation date, it is considered logically expired (it can also be removed from L1/L2 accordingly). For example:

cache entry ZZZ, creation date 450 tagged "north" and "offers" is considered expired because "north" was invalidated at time 513
cache entry YYY, creation date 450 tagged "east" and "offers" is considered valid

To support this, we still need some additional backend capabilities:

some mechanism to publish tag invalidations, similar to KeyInvalidated
some mechanism to lookup tag invalidation data from cold-start

For this,, we propose:

namespace Microsoft.Extensions.Caching.Distributed;

public interface IDistributedCacheInvalidation : IDistributedCache { // (not shown; from 1) event Action<string, DateTimeOffset> TagInvalidated; Task RemoveByTagAsync(string tag, CancellationToken token = default); // API to bulk-query tag eviction metadata Task<KeyValuePair<string, DateTimeOffset>[]> GetTagsAsync(DateTimeOffset since = default, CancellationToken token = default); // alternative single-tag metadata query API Task<DateTimeOffset?> GetTagAsync(string tag, CancellationToken token = default); }

At cold-start, the library can use GetTagsAsync to pre-populate the tag lookup with some reasonable time bound, and can respond to TagInvalidated to update this data (forwards-only) as needed. The choice of array here is intentional, as it is assumed the caller will be constructing their own lookup by iterating the data, hence "simple" is reasonable. This could arguably be IAsyncEnumerable<>, etc, but: it is only used for cold-start population of the tag metadata, so array overhead is not burdensome.

To invalidate a specific tag, we call RemoveByTagAsync, which would update the data used by GetTagsAsync and also indirectly cause TagInvalidated to be invoked by all clients. Note that unlike write-thru invalidation, tag invalidation doesn't have the problem of invalidating our own data, as it is not entry-specific.

Combining these two halves, we get the API proposal:

namespace Microsoft.Extensions.Caching.Distributed;
public delegate void DistributedCacheKeyInvalidation(string key, ReadOnlySpan header);
public interface IDistributedCacheInvalidation : IDistributedCache
{
```
int HeaderBytes { get; set; }
```

event DistributedCacheKeyInvalidation KeyInvalidated;

event Action<string, DateTimeOffset> TagInvalidated;

Task RemoveByTagAsync(string tag, CancellationToken token = default);

Task<KeyValuePair<string, DateTimeOffset>[]> GetTagsAsync(DateTimeOffset since = default, CancellationToken token = default);

Task<DateTimeOffset?> GetTagAsync(string tag, CancellationToken token = default);

}

Example implementation

Redis:

(without any special server features)

subscribe to two channels, __MSFT_DC__KeyInvalidation and __MSFT_DC__TagInvalidation
treat __MSFT_DC_Tags as a sorted-set
key delete is UNLINK {key} plus PUBLISH __MSFT_DC__KeyInvalidation {key}
key write is SET {key} {value} EX {ttl} plus PUBLISH __MSFT_DC__KeyInvalidation {key+header}
tag invalidation is ZADD __MSFT_DC_Tags GT {time} {tag} plus PUBLISH __MSFT_DC__TagInvalidation {tag+time} (on servers before 6.2, do not use GT)
get tags is ZRANGE __MSFT_DC_Tags {since} +inf BYSCORE WITHSCORES
get tag is ZSCORE __MSFT_DC_Tags {tag}

The implementation may also choose to use ZREMRANGEBYSCORE __MSFT_DC_Tags -inf {culltime} periodically, for some culltime that represents the largest possible expiration; this allows long-dead tags to be forgotten.

The channel and tags name should include any namespace partition configured, just like keys. The published tags/keys do not need to include the partition.

It is also possible to use server-assisted client-side caching or keyspace notifications, which may be considered in due course, but initially: active invalidation (i.e. where our code explicitly causes the pub/sub events) is described for simplicity, since this does not require server-side feature configuration (which is required for keyspace notifications) or an up-level server (server-assisted client-side caching requires server version 6 and client library support)

Alternative designs

The "output caching" feature is comparable in terms of supporting L2 tagging (without notification); because the concept of tags was baked into the original API, it is implemented in the backend - in SQL via relational DB semantics, and in Redis by using a SADD {tag} {key} such that tag is the redis "set" consisting of all keys associated with that tag; deleting a tag means enumerating the "set" and calling UNLINK per key, and also requires complicated periodic garbage collection to remove expired keys from each set - and to do that, we need an additional set which is the set of all known tags. The solution proposed here is much simpler to implement, and fits within the current API. So much so that when HybridCache has full tag support, I wonder if it is worth exploring a mechanism to implement IOutputCacheBufferStore on top of HybridCache.