[Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap (original) (raw)

Jim J. Jewett jimjjewett at gmail.com
Thu Aug 24 00:32:45 EDT 2017

Previous message (by thread): [Python-Dev] limited_api and datetime
Next message (by thread): [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

In https://mail.python.org/pipermail/python-dev/2017-August/148869.html Nick Coghlan wrote:

* what we want to capture at generator creation time is the context where writes will happen, and we also want that to be the innermost context used for lookups

So when creating a generator, we want a new (empty) ImplicitContext map to be the head of the ChainMap. Each generator should have one of its own, just as each generator has its own frame. And the ChainMap delegation goes up the call stack, just as an exception would. Eventually, it hits the event loop (or other Executor) which is responsible for ensuring that the ChainMap eventually defers to the proper (Chain)Map for this thread or Task.

While the context is defined conceptually as a nested chain of key:value mappings, we avoid using the mapping syntax because of the way the values can shift dynamically out from under you based on who called you ... instead of having the problem of changes inside the generator leaking out, we instead had the problem of changes outside the generator not making their way in

I still don't see how this is different from a ChainMap.

If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C, d_mine] maps as your implicit context, then a change to d_thread map (that some other code could make) will be visible unless it is masked.

Similarly, if the fallback for d_C changes from d_B to d_B1 (which points directly to d_thread), that will be visible for any keys that were previously resolved in d_A or d_B, or are now resolved in dB1.

Those seem like exactly the cases that would (and should) cause "shifting values".

This does mean that you can't cache everything in the localmost map, but that is a problem with the optimization regardless of how the implementation is done.

In https://mail.python.org/pipermail/python-dev/2017-August/148873.html Yury Selivanov wrote:

Any code that uses EC will not see any difference [between mutable vs immutable but replaced LC maps], because it can only work with the top LC.

Back to generators. Generators have their own empty LCs when created to store their local EC modifications.

OK, so just as they have their own frame, they have their own ChainMap, and the event loop is responsible for resetting the fallback when it schedules them.

When a generator is being iterated, it pushes its LC to the EC. When the iteration step is finished, it pops its LC from the EC.

I'm not sure it helps to think of a single stack. When the generator is active, it starts with its own map. When it is in the call chain of the active generator, its map will be in the chain of delegations. When neither it nor a descendant are active, no code will end up delegating to it.

If the delegation graph has 543 generators delegating directly to the thread-wide map, there is no reason to pop/push an execution stack every time a different generator is scheduled, since only that generator itself (and code it calls) will even care.

HAMT is a way to efficiently implement immutable mappings ... using regular dicts and copy, set() would be O(log N)

Using a ChainMap, set affects only the localmost map and is therefore O(1). get could require stacksize lookups, but ...

(A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off.

(B) Other than the truly global context and thread-level context, how many of these maps do you expect to be non-empty?

(C) How deep do you expect the stack to get? Are we talking about 100 layers of mappings to check between the current generator and the thread-wide defaults? Even if we are, verifying that there hasn't been a change in some mid-level layer requires tracking the versions of each mid-level layer. (If version is globally unique, that would also ensure that the stack hasn't changed.) Is that really faster than checking that the map is empty?

And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped.

-jJ

Previous message (by thread): [Python-Dev] limited_api and datetime
Next message (by thread): [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list