[Python-Dev] Guarantee ordered dict literals in v3.7? (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Mon Nov 6 07:09:18 EST 2017
- Previous message (by thread): [Python-Dev] Guarantee ordered dict literals in v3.7?
- Next message (by thread): [Python-Dev] Guarantee ordered dict literals in v3.7?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6 November 2017 at 21:18, Steve Holden <steve at holdenweb.com> wrote:
I have to agree: I find the elevation of a CPython implementation detail to a language feature somewhat hard to comprehend. Maybe it's more to do with the way it's been presented, but this is hardly an enhancement the language has been screaming for for years.
Presumably there is little concern that algorithms that rely on this behaviour will be perfectly syntactically conformant with earlier versions but will fail subtly and without explanation? It's a small concern, but a real one - particularly for learners.
A similar concern existed when we elevated sort stability to being a language requirement - if you relied on that guarantee, your code was technically buggy on versions prior to 2.3, but eventually 2.2 and earlier aged out of general use, allowing such code to become correct in general.
So the current discussion is mainly about deciding where we want the compatibility burden to fall in relation to dict insertion ordering:
- Do we deliberately revert CPython back to being harder to use correctly for the sake of making Python easier to implement?
- Do we make Python harder to implement for the sake of making it easier to use?
- Do we choose not to choose, thus implicitly choosing "2" by default due to the fact that Python is defined by a language spec and a reference implementation, rather than just a language spec?
Here's a more-complicated-than-a-doctest-for-a-dict-repo, but still fairly straightforward, example regarding the "insertion ordering dictionaries are easier to use correctly" argument:
import json
data = {"a":1, "b":2, "c":3}
rendered = json.dumps(data)
data2 = json.loads(rendered)
rendered2 = json.dumps(data2)
# JSON round trip
assert data == data2, "JSON round trip failed"
# Dict round trip
assert rendered == rendered2, "dict round trip failed"
Both of those assertions will always pass in CPython 3.6, as well as in PyPy, because their dict implementations are insertion ordered, which means the iteration order on the dictionaries is always "a", "b", "c".
If you try it on 3.5 though, you should fairly consistently see that last assertion fail, since there's nothing in 3.5 that ensures that data and data2 will iterate over their keys in the same order.
You can make that code implementation independent (and sufficiently version dependent to pass both assertions) by using OrderedDict:
from collections import OrderedDict
import json
data = OrderedDict(a=1, b=2, c=3)
rendered = json.dumps(data)
data2 = json.loads(rendered, object_pairs_hook=OrderedDict)
rendered2 = json.dumps(data2)
# JSON round trip
assert data == data2, "JSON round trip failed"
# Dict round trip
assert rendered == rendered2, "dict round trip failed"
However, despite the way this code looks, the serialised key order might not be "a, b, c" on 3.5 and earlier (it will be on 3.6+, since that already requires that kwarg order be preserved).
So the formally correct version independent code that reliably ensures that the key order in the JSON file is always "a, b, c" looks like this:
from collections import OrderedDict
import json
data = OrderedDict((("a",1), ("b",2), ("c",3)))
rendered = json.dumps(data)
data2 = json.loads(rendered, object_pairs_hook=OrderedDict)
rendered2 = json.dumps(data2)
# JSON round trip
assert data == data2, "JSON round trip failed"
# Dict round trip
assert rendered == rendered2, "dict round trip failed"
# Key order
assert "".join(data) == "".join(data2) == "abc", "key order failed"
Getting from the "Works on CPython 3.6+ but is technically non-portable" state to a fully portable correct implementation that ensures a particular key order in the JSON file thus currently requires the following changes:
- don't use a dict display, use collections.OrderedDict
- make sure to set object_pairs_hook when using json.loads
- don't use kwargs to OrderedDict, use a sequence of 2-tuples
For 3.6, we've already said that we want the last constraint to age out, such that the middle version of the code also ensures a particular key order.
The proposal is that in 3.7 we retroactively declare that the first, most obvious, version of this code should in fact reliably pass all three assertions.
Failing that, the proposal is that we instead change the dict iteration implementation such that the dict round trip will start failing reasonably consistently again (the same as it did in 3.5), so that folks realise almost immediately that they still need collections.OrderedDict instead of the builtin dict.
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message (by thread): [Python-Dev] Guarantee ordered dict literals in v3.7?
- Next message (by thread): [Python-Dev] Guarantee ordered dict literals in v3.7?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]