This is a reworked reference implementation of _threading_local without the __del__ quirks. The _patch() ugliness is unfortunately still needed because of a doctest checking that derived __slots__ attributes aren't actually thread-local. Note that users are unlikely to ever use this code in the real world, except perhaps with non-CPython implementations.