[Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap (original) (raw)

Yury Selivanov yselivanov.ml at gmail.com
Thu Aug 24 11:02:55 EDT 2017

Previous message (by thread): [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
Next message (by thread): [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett <jimjjewett at gmail.com> wrote:

On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett <jimjjewett at gmail.com> wrote:

The key requirement for using immutable datastructures is to make "getexecutioncontext" operation fast. Do you really need the whole execution context, or do you just need the current value of a specific key? (Or, sometimes, the writable portion of the context.)

We really need a snapshot of all keys/values. If we have the following "chain":

EC = [{'a': 'b'}, {'spam': 'ham'}, {}, {'a': 'foo', 'num': 42}]

we need sys.get_execution_context() to return this:

{'spam': 'ham', 'a': 'foo', 'num': 42}

To do this with chain map and mutable maps, you need to traverse the whole chain and merge all dicts -- this is very expensive. Immutable datastructures make it possible to avoid merge and traverse (in most cases) so the operation becomes cheap.

Don't forget, that the EC is a dynamic stack -- a generator starts, it pushes its LogicalContext, etc.

Without a way to get a full "snapshot" of the EC we can't support asynchronous tasks:

If you look at this small example:

foo = new_context_key()

async def nested():
    await asyncio.sleep(1)
    print(foo.get())

async def outer():
    foo.set(1)
    await nested()
    foo.set(1000)

l = asyncio.get_event_loop()
l.create_task(outer())
l.run_forever()

If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited.

Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout:

foo = new_context_key()

async def nested():
    await asyncio.sleep(1)
    print(foo.get())

async def outer():
    foo.set(1)
    await asyncio.wait_for(nested(), 10)  # !!!
    foo.set(1000)

l = asyncio.get_event_loop()
l.create_task(outer())
l.run_forever()

So we wrap our nested() in a wait_for(), which creates a new asynchronous tasks to run nested(). That task will now execute on its own, separately from the task that runs outer(). So we need to somehow capture the full EC at the moment wait_for() was called, and use that EC to run nested() within it. If we don't do this, the refactored code would print "1000", instead of "1".

If asynchronous tasks topic is not enough: suppose you want to write a wrapper on top of concurrent.futures.ThreadPoolExecutor to offload some calculation to a threadpool. If you can "snapshot" the EC, you can do it: when you schedule a job, the EC is captured, and it is a guarantee that it will not be mutated before the job is executed. And it's also guaranteed that the job will not randomly mutate the EC of the code that scheduled it.

Currently, the PEP doesn't do a good job at explaining why we need that operation and why it will be used by asyncio.Task and callsoon, so I understand the confusion. OK, the schedulers need the whole context, but if implemented as a ChainMap (instead of per-key),

I think you are a bit confused here (or I don't follow you) with the "per-key" ConceptKey design.

With ConceptKeys:

key = new_concept_key('spam')
key.set('a')
# EC = [..., {key: 'a'}]

without:

ec.set('key', 'a')
# EC = [..., {'key': 'a'}]

ConceptKey is an extra indirection layer that allows for granular caching and avoids the names clashing.

isn't that just a single constant? As in, don't they always schedule to the same thread? And when they need another map, isn't that because the required context is already available from whichever code requested the scheduling?

(A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off. Not many, but caching is still as important, because some API users want the "get()" operation to be as fast as possible under all conditions. Sure, but only because they view it as a hot path; if the cost of that speedup is slowing down another hot path, like scheduling the generator in the first place, it may not be worth it. According to the PEP timings, HAMT doesn't beat a copy-on-write dict until over 100 items, and never beats a regular dict. That suggests to me that it won't actually help the overall speed for a typical (as opposed to worst-case) process.

It's a bit more complicated, unfortunately.

The "100 items" it true when we can just memcpy all keys/values of the dict when we want to clone it. I've proposed a patch to do that, and it works.

The old implementation of "dict.copy()" created a new dict, iterated over items of the old dict, and inserted them to the new one. This 5x slower than memcpy.

The subtle difference is that the memcpy implementation does not resize the dict. The resize is needed when we delete items from it (which will happen with EC).

I've updated the chart to show what happens when we implement immutable dict with the current dict.copy() semantics:

https://github.com/python/peps/blob/master/pep-0550-hamt_vs_dict.png

The fundamental problem, though, is that EC is hidden from the user. Suppose you have a huge application (let's say new YouTube) that uses a multitude of libraries. You can then be in a situation that some of those libraries use the EC, and you actually have 100s of items in it. Now at least one LC in the chain is big, and its copying becomes expensive. You start to experience overhead -- sometimes setting a variable to the EC is expensive, sometime capturing the context takes longer, etc.

HAMT guarantees that immutable dicts will have O(log N) performance for set() and merge (we will need to merge long chains from time to time, btw). This means that it's a future proof solution that works in all edge cases.

And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped. The first version of the PEP had no ContextKey object and the most popular complaint about it was that the key names will clash. That is true of any global registry. Thus the use of keys with prefixes like com.sun.

Such prefixes aren't considered Pythonic though.

In the end, ContextKey is needed to give us O(1) get() operation (because of caching). This is a hard requirement, otherwise we can't migrate libraries like decimal to use this new API.

The only thing pre-declaring a ContextKey buys in terms of clashes is that a sophisticated scheduler would have less difficulty figuring out which clashes will cause thrashing in the cache.

I don't understand what you mean by "sophisticated scheduler".

Or are you suggesting that the key can only be declared once (as opposed to once per piece of code), so that the second framework to use the same name will see a RuntimeError?

ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow:

var = new_context_key('aaaaaa')
var.set(1)
# EC = [..., {var: 1}]

# Note the that EC has a "var" object itself as the key in the

mapping, not "aaaaa".

Yury

Previous message (by thread): [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
Next message (by thread): [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list