Issue 36144: Dictionary union. (PEP 584) (original) (raw)

Created on 2019-02-28 04:18 by brandtbucher, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (66)

msg336798 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2019-02-28 04:18

...as discussed in python-ideas. Semantically:

d1 + d2 <-> d3 = d1.copy(); d3.update(d2); d3 d1 += d2 <-> d1.update(d2)

Attached is a working implementation with new/fixed tests for consideration. I've also updated collections.UserDict with the new add/radd/iadd methods.

msg336803 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2019-02-28 06:33

I believe that Guido rejected this when it was proposed a few years ago.

msg336808 - (view)

Author: Karthikeyan Singaravelan (xtreak) * (Python committer)

Date: 2019-02-28 07:07

Python ideas discussion in 2015 : https://mail.python.org/pipermail/python-ideas/2015-February/031748.html LWN summary : https://lwn.net/Articles/635397/

msg336810 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2019-02-28 07:29

I believe it was proposed and rejected multiple times.

msg336811 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2019-02-28 07:36

For the record, I'm opposed to the idea.

msg336812 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2019-02-28 07:52

In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

But I have other arguments against it:

msg336816 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2019-02-28 08:23

In Python, the plus operator for sequences (strings, lists, tuples) is non-commutative.

For sequences, that is obvious and expected, but not so much with mappings where the order of overlapping keys is determined by the left operand and the value associated with those keys is determined by the right operand.

Also with sequences the + operator actually means "add to", but with dictionaries it means "add/or replace" which is contrary to the normal meaning of plus. I think that was one of Guido's reasons for favoring "|" instead of "+" for set-to-set operations.

We already have a syntax for dict merging: {**d1, **d2}. It works with arbitrary mappings,

This is a good point.

msg336820 - (view)

Author: Stefan Behnel (scoder) * (Python committer)

Date: 2019-02-28 09:20

We already have a syntax for dict merging: {**d1, **d2}.

Which doesn't mean that "d1 + d2" isn't much more intuitive than this special-character heavy version. It takes me a while to see the dict merge under that heap of stars. And that's already the shortest example.

It works with arbitrary mappings,

The RHS of "d += M" doesn't have to be a dict IMHO, it could be any mapping. And even "dict(X) + M" doesn't look all too bad to me, even though there's "dict(X, **M)".

Use of the + operator is a temptation to produce new dictionaries rather than update an existing dict in-place which is usually what you want.

That's why there would be support for "+=". The exact same argument already fails for lists, where concatenation is usually much more performance critical than for the average little dict. (And remember that most code isn't performance critical at all.)

We already have ChainMap() which presents a single view of multiple mappings with any copying.

Which is a different use case that is unlikely to go away with this proposal.

makes it less clear that you're working with dicts.

This is a valid argument, although it always depends on the concrete code what the most readable way to express its intentions is. Again, this doesn't really differ for lists.

Let's wait for the PEP, I'd say.

msg336847 - (view)

Author: Josh Rosenberg (josh.r) * (Python triager)

Date: 2019-02-28 16:21

scoder: dict(X, **M) is broken unless M is known to be string keyed (it used to work, but in Python 3, it will raise a TypeError). It's part of the argument for the additional unpacking generalizations from PEP 448; {**X, **M} does what dict(X, **M) is trying to do, but without abusing the keyword argument passing convention.

You also claim "It takes me a while to see the dict merge under that heap of stars", but that's at least as much about the newness of PEP 448 (and for many Python coders, a complete lack of familiarity with the pre-existing varargs unpacking rules for functions) as it is about the punctuation; after all, you clearly recognize dict(X, **M) even though it's been wrong in most contexts for years.

In any event, I'm a strong -1 on this, for largely the same reasons as Raymond and others:

  1. It doesn't provide any new functionality, just one more way to do it; += is satisfied by .update, + is satisfied (more generally and efficiently) by the unpacking generalizations

  2. It's needlessly confusing; addition is, for all existing types in the standard library I can think of, lossless; the information from both sides of the + is preserved in some form, either by addition or concatenation (and in the concatenation case, addition is happening, just to the length of the resulting sequence, and order is preserved). Addition for dictionaries would introduce new rules specific to dicts that do not exist for any other type regarding loss of values, non-additive resulting length, etc. Those rules would likely be similar to those of dict literals and the update method, but they'd need to be made explicit. By contrast, the PEP 448 unpacking generalization rules followed the existing rules for dict literals; no special rules occur, it just behaves intuitively (if you already knew the rules for dict literals without unpacking being involved).

  3. Almost any generic, duck-typing based code for which addition makes sense will not make sense for dicts simply because it loosens the definition of addition too much to be useful, so best case, it still raises TypeError (when dicts added to non-dict things), worst case, it silently operates in a way that violates the rules of both addition and concatenation rather than raising a TypeError that the generic code could use to determine the correct thing to do.

  4. The already mentioned conflict with Counter (which already has an addition operator, with lossless semantics)

  5. (Minor) It means PyDict_Type needs a non-NULL tp_as_number, so now it's slightly slower to reject dicts as being non-numeric at the C layer

Problem #2 could be used to argue for allowing | instead of + (which would also resolve #4, and parts of #3), since | is already used for unioning with sets, and this operation is much closer to a union operation than addition or concatenation. Even so, it would still be misleading; at least with sets, there is no associated value, so it's still mostly lossless (you lose the input lengths, but the unique input data is kept); with dicts, you'd be losing values too.

Basically, I think the PEP 448 unpacking syntax should remain as the "one-- and preferably only one --obvious way to" combine dictionaries as a one-liner. It's more composable, since it allows adding arbitrary additional key/value pairs, and more efficient, since it allows combining more than two dicts at once with no additional temporaries: dicta + dictb + dictc requires "dictab" to be made first, then thrown away after dictab + dictc produces dictabc, while {**dicta, **dictb, **dictc} builds dictabc directly.

The only real argument I can see for not sticking to unpacking is that it doesn't allow for arbitrary dict-like things to produce new dict-like things directly; you'd have to rewrap as myspecialdict({**speciala, **specialb}). But I don't think that's a flaw worth fixing if it means major changes to the behavior of what I'm guessing is one of the three most commonly used types in Python (along with int and tuple, thanks to the integration of dicts into so many facets of the implementation).

msg336848 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2019-02-28 16:34

I changed my mind and am now in favor. Most of the arguments against could also be used against list+list. Counter addition is actually a nice special case of this -- it produces the same keys but has a more sophisticated way of merging values for common keys. Please read the python-ideas thread!

msg336849 - (view)

Author: Josh Rosenberg (josh.r) * (Python triager)

Date: 2019-02-28 16:42

Also note: That Python ideas thread that xtreak linked ( https://mail.python.org/pipermail/python-ideas/2015-February/031748.html ) largely rejected the proposal a couple weeks before PEP 448 was approved. At the time, the proposal wasn't just about +/+=; that was the initial proposal, but operator overloading was heavily criticized for the failure to adhere to either addition or concatenation semantics, so alternate constructors and top-level functions similar to sorted were proposed as alternatives (e.g. merged(dicta, dictb)). The whole thread ended up being about creating an approved, built-in way of one-lining: d3 = d1.copy(); d3.update(d2)

A key quote though is that this was needed because there was no other option without rolling your own merged function. Andrew Barnert summarized it best:

"I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator.

"Unless someone is seriously championing PEP 448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP 448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another."

As it happens, PEP 448 was put in 3.5, and we got the one obvious way to do it.

Side-note: It occurs to me there will be one more "way to do it" in 3.8 already, thanks to PEP 572:

(d3 := d1.copy()).update(d2)

I think I'll stick with d3 = {**d1, **d2} though. :-)

msg336854 - (view)

Author: Karthikeyan Singaravelan (xtreak) * (Python committer)

Date: 2019-02-28 17:30

Current python-ideas thread for the issue : https://mail.python.org/pipermail/python-ideas/2019-February/055509.html

msg337094 - (view)

Author: Viktor Kharkovets (slam) *

Date: 2019-03-04 12:09

If we're going to forget about commutativity of +, should we also implement +/+= for sets?

msg337107 - (view)

Author: Stefan Behnel (scoder) * (Python committer)

Date: 2019-03-04 13:05

should we also implement +/+= for sets?

The question is: what would that do? The same as '|=' ? That would be rather confusing, I think. "|" (meaning: "or") seems a very natural operation for sets, in the same way that "|" operates on bits in integers. That suggests that "|" is the right operator for sets.

In any case, this is an unrelated proposal that is better not discussed in this ticket. The only link is whether "|" is the more appropriate operator also for dicts, which is to be discussed in the PEP and thus also not in this ticket.

msg337266 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-03-06 01:54

Is this issue directly or indirectly related to the PEP 584 "Add + and - operators to the built-in dict class"? https://www.python.org/dev/peps/pep-0584/

msg337267 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-03-06 01:55

Is this issue directly or indirectly related to the PEP 584 "Add + and - operators to the built-in dict class"? https://www.python.org/dev/peps/pep-0584/

Ah yes, it's written in the title of the PR. I add it to the bug title as well.

msg358305 - (view)

Author: Aaron Hall (Aaron Hall) *

Date: 2019-12-12 21:16

Another obvious way to do it, but I'm +1 on it.

A small side point however - PEP 584 reads:

To create a new dict containing the merged items of two (or more) dicts, one can currently write:

{**d1, **d2}

but this is neither obvious nor easily discoverable. It is only guaranteed to work if the keys are all strings. If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].

...

[2] Non-string keys: https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html

The references cited does not back this assertion up. Perhaps the intent is to reference the "cool/weird hack" dict(d1, **d2) (see https://mail.python.org/pipermail/python-dev/2010-April/099485.html and https://mail.python.org/pipermail/python-dev/2010-April/099459.html), which allowed any hashable keys in Python 2 but only strings in Python 3.

If I see {**d1, **d2}, my expectations are that this is the new generalized unpacking and I currently expect any keys to be allowed, and the PEP should be updated to accurately reflect this to prevent future misunderstandings.

msg362619 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-25 03:46

PEP 584 has been approved by the Steering Council (at my recommendation). We will shortly begin landing PRs related to this.

msg362620 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-25 03:47

New changeset eb8ac57af26c4eb96a8230eba7492ce5ceef7886 by Brandt Bucher in branch 'master': bpo-36144: Dictionary Union (PEP 584) (#12088) https://github.com/python/cpython/commit/eb8ac57af26c4eb96a8230eba7492ce5ceef7886

msg362621 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-25 03:49

While the main code has been merged now, I propose to keep this issue open until some other things have happened:

msg362626 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-25 06:29

My current PR plans are:

I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated.

That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed.

Guido, okay to tag you on these for review?

msg362645 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-25 15:07

Yup, great plan.

On Mon, Feb 24, 2020 at 22:29 Brandt Bucher <report@bugs.python.org> wrote:

Brandt Bucher <brandtbucher@gmail.com> added the comment:

My current PR plans are:

I'll also create a BPO issue to discuss whether the dict subclasses in http.cookies should be updated.

That should do it for CPython; I'm planning on updating typeshed and adding a handful of tests to mypy for TypedDict, etc. after these are landed.

Guido, okay to tag you on these for review?



Python tracker <report@bugs.python.org> <https://bugs.python.org/issue36144>


-- --Guido (mobile)

msg362660 - (view)

Author: Steve Dower (steve.dower) * (Python committer)

Date: 2020-02-25 20:05

Not sure if this is a big deal or not, and it seems likely that the preexisting behaviour of .update() and ** unpacking have already decided it, but is it intentional that you end up with the first-seen key and the last-seen value in the case of collisions?

class C: def init(self, *a): self.a = a def hash(self): return hash(self.a[0]) def eq(self, o): return self.a[0] == o.a[0] def repr(self): return f"C{self.a}"

c1 = C(1, 1); c1 C(1, 1) c2 = C(1, 2); c2 C(1, 2)

For set union we get the first seen value:

{c1} | {c2} {C(1, 1)}

For dict union we get the first seen key and the last seen value:

{c1: 'a'} | {c2: 'b'} {C(1, 1): 'b'}

But similarly for dict unpack (and .update(); code left as an exercise to the reader):

{**{c1: 'a'}, **{c2: 'b'}} {C(1, 1): 'b'}

So the union of two dicts may contain .items() elements that were not in either of the inputs.

Honestly, I've never noticed this before, as the only time I create equivalent objects with meaningfully-distinct identities is to use with sets. I just figured I'd try it out after seeing suggestions that the dict union operands were transposed from set union.

msg362663 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-25 20:19

As a somewhat simpler example:

f = {False: False} z = {0: 0} f | z {False: 0} {**f, **z} {False: 0} f.update(z); f {False: 0}

Though these hairier cases aren't explicitly addressed, the conflict behavior is covered in the Rationale and Reference Implementation sections of the PEP. All of the above examples share code (dict_update_arg), and that's definitely intentional. I for one think it would be confusing (and probably a bug) if one of the examples above gave a different key-value pair!

I find it makes more sense if you see a set as valueless keys (rather than keyless values).

msg362666 - (view)

Author: Steve Dower (steve.dower) * (Python committer)

Date: 2020-02-25 20:42

That's a much simpler example. And of course:

z[False] = False z {0: False}

So the precedent is well established that the key doesn't get updated with the value.

No further questions, yer honour ;)

msg362726 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-26 20:01

New changeset d0ca9bd93bb9d8d4aa9bbe939ca7fd54ac870c8f by Brandt Bucher in branch 'master': bpo-36144: Document PEP 584 (GH-18659) https://github.com/python/cpython/commit/d0ca9bd93bb9d8d4aa9bbe939ca7fd54ac870c8f

msg362729 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-26 20:03

@Brandt: you have some more followup PRs planned right? Let's keep this issue open until you've done all of those.

msg362740 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-26 21:20

Yep. I'm currently working on OrderedDict, defaultdict, and MappingProxyType.

My brother is looking to make his first contribution, so he'll be taking care of ChainMap.

msg362757 - (view)

Author: Josh Rosenberg (josh.r) * (Python triager)

Date: 2020-02-27 00:29

What is ChainMap going to do? Normally, the left-most argument to ChainMap is the "top level" dict, but in a regular union scenario, last value wins.

Seems like layering the right hand side's dict on top of the left hand side's would match dict union semantics best, but it feels... wrong, given ChainMap's normal left-to-right precedence. And top-mostness affects which dict receives all writes, so if chain1 |= chain2 operates with dict-like precedence (chain2 layers over chain1), then that also means the target of writes/deletions/etc. changes to what was on top in chain2.

msg362759 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 00:56

The plan is to follow dict’s semantics. The |= operator will basically delegate to the first map in the chain. The | operator will create a new ChainMap where the first map is the merged result of the old first map, and the others are the same.

So, basically update / copy-and-update, respectively.

msg362761 - (view)

Author: Josh Rosenberg (josh.r) * (Python triager)

Date: 2020-02-27 01:26

Sorry, I think I need examples to grok this in the general case. ChainMap unioned with dict makes sense to me (it's equivalent to update or copy-and-update on the top level dict in the ChainMap). But ChainMap unioned with another ChainMap is less clear. Could you give examples of what the expected end result is for:

d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d3 = {'a': 5, 'd': 6}
d4 = {'d': 7, 'e': 8}
cm1 = ChainMap(d1, d2)
cm2 = ChainMap{d3, d4)

followed by either:

cm3 = cm1 | cm2

or cm1 |= cm2

? As in, what is the precise state of the ChainMap cm3 or the mutated cm1, referencing d1, d2, d3 and d4 when they are still incorporated by references in the chain?

My impression from what you said is that the plan would be for the updated cm1 to preserve references to d1 and d2 only, with the contents of cm2 (d3 and d4) effectively flattened and applied as an in-place update to d1, with an end result equivalent to having done:

cm1 = ChainMap(d1, d2)
d1 |= d4
d1 |= d3

(except the key ordering would actually follow d3 first, and d4 second), while cm3 would effectively be equivalent to having done (note ordering):

cm3 = ChainMap(d1 | d4 | d3, d2)

though again, key ordering would be based on d1, then d3, then d4, not quite matching the union behavior. And a reference to d2 would be preserved in the final result, but not any other original dict. Is that correct? If so, it seems like it's wasting ChainMap's key feature (lazy accumulation of maps), where:

cm1 |= cm2

could be equivalent to either:

cm1.maps += cm2.maps

though that means cm1 wins overlaps, where normal union would have cm2 win, or to hew closer to normal union behavior, make it equivalent to:

cm1.map[:0] = cm2.maps

prepending all of cm2's maps to have the same duplicate handling rules as regular dicts (right side wins) at the expense of changing which map cm1 uses as the target for writes and deletes. In either case it would hew to the spirit of ChainMap, making dict "union"-ing an essentially free operation, in exchange for increasing the costs of lookups that don't hit the top dict.

msg362774 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-27 05:59

I think for |= the only choice is for it to be essentially an alias to .update(). So that means cm |= other becomes cm.maps[0].update(other).

For | we are breaking new ground and we could indeed make cm | other do something like ChainMap(other, *cm.maps).

I've not used ChainMap much (though I've seen some code that uses it) so I'm probably not the best judge of whether this is a good feature to have.

Note that other | cm will just do whatever other.__or__ does, since ChainMap isn't a true subclass of dict, so it will not fall back to cm.__ror__. Basically ChainMap will not get control in this case.

Other thoughts:

msg362779 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 07:04

I think for |= the only choice is for it to be essentially an alias to .update(). So that means cm |= other becomes cm.maps[0].update(other).

Agreed.

These semantics make |= behave rather differently from |. Is that okay? If not, which of them should change, and how?

I don’t like this. Let me try to explain why:

So far (and to the best of my knowledge), setting and updating values on a ChainMap works exactly the same as it does for dict, with all of the same semantics (the docs themselves even say that “all of the usual dictionary methods are supported”… which now could be interpreted as meaning | and |= as well). It’s only when deleting or using the new interfaces that things get more specialized.

But that doesn’t really apply here. Having different (or worse, inconsistent) behavior for these operators, I feel, would be more confusing than helpful. Remember, a major goal of this proposal is to aid in duck typing.

So, Josh’s understanding of my intended semantics is correct, I propose that, for:

d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d3 = {'a': 5, 'd': 6}
d4 = {'d': 7, 'e': 8}
cm1 = ChainMap(d1, d2)
cm2 = ChainMap{d3, d4)

cm3 = cm1 | cm2

Gives cm3 a value of:

ChainMap(d1 | d4 | d3, d2)  # Or, equivalently: ChainMap(d1 | dict(cm2), d2)

And:

cm1 |= cm2

Is equivalent to:

d1 |= cm2

I don’t want to change which map is "first", and I think changing the winning behavior from that of dict will create more problems than it solves. We only need to look at how ChainMap handles the update method… it keeps the same exact behavior, rather than trying to be lazy or reversed or something.

If we are deciding to do something different, then I think it should have no relationship to PEP 584, which reasons out a carefully considered merge operation for dict, not ChainMap. But, it would also probably need a different operator, and be able to stand on its own merits.

msg362810 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-27 15:43

I had just come to a different conclusion. Maybe ChainMap should just not grow | and |= operators? That way there can be no confusion. dict() | ChainMap() and ChainMap() | dict() will fail because ChainMap doesn't inherit from dict. (Note that in your last message, d1 |= cm2 will fail for this reason. You can of course fix that with d1 |= dict(cm2), although IIUC there's no reason one of the maps couldn't be some other [Mutable]Mapping.)

msg362813 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 15:57

Note that in your last message, d1 |= cm2 will fail for this reason. You can of course fix that with d1 |= dict(cm2), although IIUC there's no reason one of the maps couldn't be some other [Mutable]Mapping.

Mappings and iterables are fine for the in-place variant. :)

from collections import ChainMap d = {} c = ChainMap({"r": 2, "d":2}) d |= c d {'r': 2, 'd': 2}

I think it would be confusing to have ChainMap | ChainMap behave subtly different than dict | ChainMap. It would be especially odd if it also differed subtly from ChainMap | dict.

To recap:

+1 on adding the operators with dict semantics, +0 on no PEP 584 for ChainMap. -0 on implementing them, but changing the winning behavior by concatenating the maps lists or something. This would probably make more sense to me as a + operator, honestly. :( -1 for having the operators behave differently (other than performance shortcuts) for cm | d, cm | cm, cm |= d, cm |= cm.

msg362816 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-27 17:01

OK, assuming |= gets the same semantics as update(), can you repeat once more (without motivation) what the specification for cm | other will be?

msg362817 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 17:10

I believe that:

cm | other

Should return the equivalent of:

ChainMap(cm.maps[0] | dict(other), *cm.maps[1:])

msg362819 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 17:18

...however, I could also see the (similar):

ChainMap(other, *cm.maps)  # Note that `other` is the original reference here.

Being okay as well. Maybe even better, now that I've written it out.

msg362821 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-27 17:26

OK, that makes sense, it works similar to ChainMap.copy(), which copies maps[0] and keeps links to the rest. So in particular cm | {} will do the same thing as cm.copy().

Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

def or(self, other): new = self.copy() new |= other # OR new.update(other) ??? return new

def ior(self, other): self.update(other) return self

Note that there is no ChainMap.update() definition -- it relies on MutableMapping.update().

I guess we need a ror as well, in case there's some other mapping that doesn't implement or:

def ror(self, other): new = other.copy() new.update(self) return new

Note that this doesn't return a ChainMap but an instance of type(other). If other doesn't have a copy() method it'll fail.

As a refinement, or and ror should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

msg362823 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-27 17:28

I didn't see your second reply, with ChainMap(other, *cm.maps).

I'm not so keen on that, because its special behavior can't be mimicked by |=.

msg362827 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 17:49

Im not sure if the dict(other) cast is the best way to go about it. Maybe this would work?

Yeah, I was imagining something like that... I used the cast for brevity in my reply but that probably wasn't helpful.

Note that for or, we probably want to check the type of the argument (for either dict or ChainMap, or maybe just Mapping), to keep it from working on an iterable of key-value pairs.

I guess we need a ror as well, in case there's some other mapping that doesn't implement or:

Agreed. Again, we can check for Mapping here to assure success for the copy() move.

As a refinement, or and ror should perhaps check whether the operation can possibly succeed and return NotImplemented instead of raising? (Based on the type of other only, not its contents.)

Yup, see above. I think a check for Mapping should be fine.

msg362828 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 17:52

Just to clarify:

If we decide to check isinstance(other, (ChainMap, dict)), '|' should probably be used.

If we decide to check isinstance(other, Mapping), I think the copy/update methods should be used.

msg362831 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2020-02-27 20:00

def or(self, other): return self.class(self.maps[0] | other, *self.maps[1:])

def ror(self, other): return other | dict(self)

def or(self, other): return self.class(other, *self.maps)

def ror(self, other): return self.class(*self.maps, other)

There are problems with both variants, so I think it may be better to not add this operator to ChainMap.

msg362835 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-27 20:12

I think we're only seriously considering the first variant (although implemented slightly differently, see my last two messages). And ror would probably change, returning the type of self.

What are the "problems" with it, exactly? We seem to be in agreement that the update behavior is reasonable, even for ChainMaps.

msg362852 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-02-28 00:48

We already have somewhat different semantics of | for Counter, and hence I think it's fine to give it the most useful semantics for ChainMap given that class's special behavior. I think we've come up with the right solution there.

Let's stop the debate and put up a PR.

msg362855 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-02-28 00:58

Sounds good, I'll have these up soon.

msg363526 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-06 17:24

New changeset 57c9d1725689dde068a7fccaa7500772ecd16d2e by Brandt Bucher in branch 'master': bpo-36144: Implement defaultdict union (GH-18729) https://github.com/python/cpython/commit/57c9d1725689dde068a7fccaa7500772ecd16d2e

msg363527 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-06 17:34

Still waiting for ChainMap -- what else?

msg363528 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-03-06 17:51

My brother will have a ChainMap PR up soon. I'm just finishing up MappingProxyType, myself. Probably both this weekend.

Then I'll move on to OrderedDict, which looks like it could be tricky. I'll need to familiarize myself with the implementation better (unless there's somebody who is already familiar with it who wants to take over). It looks well-commented, though.

I think we can pass on the http.cookies subclasses since there don't appear to be any experts/maintainers for that module.

msg363613 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-07 19:03

New changeset 4663f66f3554dd8e2ec130e40f6abb3c6a514775 by Brandt Bucher in branch 'master': bpo-36144: Update MappingProxyType with PEP 584's operators (#18814) https://github.com/python/cpython/commit/4663f66f3554dd8e2ec130e40f6abb3c6a514775

msg363625 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-03-08 02:36

Issue 39857 just reminded me that we should update os._Environ as well (the type of os.environ and os.environb).

I have another first-timer who will probably want to take it.

msg363889 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2020-03-11 09:14

Once this issue will be done, would you mind to update PEP 584? Currently, it only says "This PEP proposes adding merge (|) and update (|=) operators to the built-in dict class." It doesn't mention that other types like OrderedDict, MappingProxy or ChainMap are updated as well.

msg363953 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-03-11 18:04

Yes, I can add a section explaining that after the PEP was accepted, we decided to add the operators to several non-dict mappings as well. I'm also going to add some explanation as to why Mapping/MutableMapping didn't grow them, too.

msg364106 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-13 16:04

New changeset d648ef10c5c7659ed3c9f34d5c751dc55e2c6007 by Charles Burkland in branch 'master': bpo-36144: Update os.environ and os.environb for PEP 584 (#18911) https://github.com/python/cpython/commit/d648ef10c5c7659ed3c9f34d5c751dc55e2c6007

msg364107 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-13 16:06

New changeset 6d674a1bf456945eb758e85c11484a9f1494f2b4 by Brandt Bucher in branch 'master': bpo-36144: OrderedDict Union (PEP 584) (#18967) https://github.com/python/cpython/commit/6d674a1bf456945eb758e85c11484a9f1494f2b4

msg364195 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-03-14 19:31

Three other MutableMappings we might want to update:

Shelf is up in the air, since it doesn't look like it defines a copy() equivalent... I also have no experience with it. Since it's a MutableMapping subclass, (not a dict subclass), we could in theory hold off on updating this until someone asks for it, without backward compatibility issues.

I think the other two should be updated, though. I can coordinate PRs for them this week.

msg364196 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-14 20:05

I definitely think we should leave Shelf alone, it's a toy class from a different era.

It makes sense to update the weak dicts; hopefully the | and |= operators can be implemented in terms of other, more primitive operations, so we will have assurance that these classes' essential behavior is preserved.

msg364887 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-23 19:02

New changeset f393b2c588559162dc2e77f8079a42e48558870a by Curtis Bucher in branch 'master': bpo-36144: Add PEP 584 operators to collections.ChainMap (#18832) https://github.com/python/cpython/commit/f393b2c588559162dc2e77f8079a42e48558870a

msg364900 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-23 20:49

New changeset 25e580a73c163f472fdeb5489bebef85da21655c by Curtis Bucher in branch 'master': bpo-36144: Add union operators to WeakKeyDictionary (#19106) https://github.com/python/cpython/commit/25e580a73c163f472fdeb5489bebef85da21655c

msg364969 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-25 01:51

New changeset 8f1ed21ecf57cc8b8095d9d1058af2b9b3ed0413 by Curtis Bucher in branch 'master': bpo-36144: Add union operators to WeakValueDictionary584 (#19127) https://github.com/python/cpython/commit/8f1ed21ecf57cc8b8095d9d1058af2b9b3ed0413

msg364992 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-03-25 15:24

And... that's it! Big thanks to everybody who had a part in making this happen.

msg365245 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-29 04:56

I'm guessing this can be closed?

msg365333 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2020-03-30 16:54

I guess we should keep this open until Raymond Hettinger has given feedback on https://github.com/python/cpython/pull/18832 (where we have the option of changing to Brandt's proposal from https://github.com/python/cpython/pull/18832#issuecomment-596910350).

msg371520 - (view)

Author: Sumit Jaiswal (justjais)

Date: 2020-06-15 05:50

First off, thanks for adding the feature, it's much appreciated.

But it'd be great if you guys can enable list merge for the dict having list as its value, in current form I believe it's handling only "key: value" merge.

for e.g.:

d1 = {'spam': [1, 2, 3]} d2 = {'spam': [2, 3, 4]} d1 | d2 {'spam': [1, 2, 3, 4]}

msg371549 - (view)

Author: Brandt Bucher (brandtbucher) * (Python committer)

Date: 2020-06-15 14:24

Similar behavior was considered and ultimately rejected by the PEP as being too specialized:

https://www.python.org/dev/peps/pep-0584/#concatenate-values-in-a-list

What you're asking for it subtly different, and even more specialized than that. I'd recommend just subclassing dict and overriding the operator, as the PEP suggests.

msg373205 - (view)

Author: Raps Uk (Raps Uk)

Date: 2020-07-07 05:39

Taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:

c = dict(a.items() | b.items()) This example demonstrates what happens when values are unhashable:

x = {'a': []} y = {'b': []} dict(x.items() | y.items()) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'list' Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

x = {'a': 2} y = {'a': 1} dict(x.items() | y.items()) {'a': 2}

http://net-informations.com/python/ds/merge.htm