Issue 22515: Implement partial order on Counter (original) (raw)

Created on 2014-09-29 19:33 by cool-RR, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (66)

msg227819 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 19:33

I suggest implementing Counter.__lt__ which will be a partial order, similarly to set.__lt__. That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.

msg227824 - (view)

Author: Steven D'Aprano (steven.daprano) * (Python committer)

Date: 2014-09-29 20:15

On Mon, Sep 29, 2014 at 07:33:21PM +0000, Ram Rachum wrote:

I suggest implementing Counter.__lt__ which will be a partial order, similarly to set.__lt__.

Since Counter is described as a multiset, this sounds reasonable to me.

msg227827 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2014-09-29 20:24

What should be result of following operations?

Counter({'a': 0}) < Counter({}) Counter({}) < Counter({'a': 0}) Counter({'a': -1}) < Counter({}) Counter({}) < Counter({'a': -1})

msg227828 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 20:26

I suggest they be ignored like in elements.

msg227829 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 20:30

(I mean, the non-positive values should be ignored.)

msg227831 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2014-09-29 20:35

There is other definition of the <= operator for sets: "A <= B" is equivalent to "len(A - B) == 0". Extending to Counter this can mean "len(A.subtract(B).elements()) == 0".

msg227842 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-09-29 21:15

What would be the result of

Counter({'a':1, 'b':2}) < Counter({'a':2, 'b':1})

?

msg227844 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 21:20

False, like with sets.

msg227845 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2014-09-29 21:21

Thanks, Ethan. I had a feeling that this wasn't well defined but I couldn't come up with an example :)

msg227846 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 21:24

David, there's nothing here that isn't well defined. It's simply a partial order, not a total order. We have the same for sets.

msg227847 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-09-29 22:06

set's don't have values, and you are wanting to implement the partial ordering based on the values. (side-note: how does partial-ordering work for sets?)

That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.

According to your definition, my example should have returned True, which is clearly nonsensical.

Even if you changed the definition to:

For every item in the first counter, that item's value is less than the corresponding item in the second counter.

You have situations like:

Counter({'a':1, 'b':1}) < Counter({'a':2})

I just don't think there is one interpretation that is going to be correct most of the time.

msg227848 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 22:30

Ethan, I don't understand what the problem is. I also don't understand your side note question "how does partial-ordering work for sets?" I'm not sure what you're asking.

That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.

According to your definition, my example should have returned True, which is clearly nonsensical.

In your example the first counter had b twice, while the second counter had it only once. So the result is False. This comes pretty directly from my definition, I'm not sure where the confusion is.

Regarding your new example: Since Counter works like a defaultdict, the second counter has a zero quantity of b. So the answer is again False.

msg227849 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 22:32

To put it another way: In Python sets, a <= b iff b has everything that a has, and possibly more. I'm proposing the exact same definition for counters. True, the implementation might be different because sets are not dicts and don't have .values(), but that is just the implementation, which is not the important thing here.

msg227850 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-09-29 22:36

That is certainly a better definition. If you carefully reread your original post you will see that you wrote something different ('any' instead of 'every'). Also, your OP uses 'lt' but talks about 'less-than-or-equal -- somewhat confusing. ;)

msg227851 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 22:44

You're right, sorry. I meant the mathematical "for any" which means "for every": https://en.wikipedia.org/wiki/List_of_mathematical_symbols (See "for any;")

msg227852 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-09-29 22:54

The request sounds ok to me. As in the corner case of negative values, it seems that Counter currently doesn't elide zero values, e.g.:

Counter({'a': 0}) == Counter({}) False

Therefore, we should have:

Counter({'a': -1}) < Counter({'a': 0}) True

but:

Counter({'a': -1}) < Counter({}) False

(I must admit, the current semantics of Counter look a bit ad hoc and not terribly consistent... halfway between a regular mapping and a multiset)

msg227857 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 23:13

Shall I write a patch?

msg227858 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-09-29 23:17

I think the final call is Raymond's (whether to accept the patch), but go ahead. Worst case scenario is you'll have your own thoroughly tested PartiallyOrderedCounter you can use in your own code. :)

msg227860 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-09-29 23:21

If that's the case I'd prefer Raymond to first say whether this feature is generally welcome before spending my time on making a patch.

msg228071 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2014-10-01 14:18

Is there some particular problem you're trying to solve, which this would make easier?

Without a use case, I'm -1.

msg228072 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-01 14:23

I needed it for an internal calculation in a combinatorics package I'm writing. Do you want me to take the time to explain the calculation? It's quite complex.

msg228077 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2014-10-01 14:50

No need to explain it. It sounds like it's not generally useful, so I'm still -1.

msg228082 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-10-01 16:04

Even if you don't find it useful, Eric, it doesn't take up the method space. You can very easily ignore it, there's no cognitive burden.

msg228083 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 16:27

If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language? After all, we have the math and cmath modules for more specialized arithmetic operations.

msg228084 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 16:29

Curiousity question: What happens if you try to sort a list of partially ordered Counters?

msg228085 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-10-01 16:30

If it is so specialized as to only be needed in complex combinatorial calculations

How do you know it is "only needed in complex combinatorial calculations"?

What happens if you try to sort a list of partially ordered Counters?

Try it with partially ordered sets.

msg228086 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-01 16:42

I don't see why it's so hard to imagine how this will be used. Say I have a counter signifying how many of each product I have in my warehouse, and I have another counter saying how many of each product a client wants. I may use counter2 <= counter1 to check whether there are enough products in stock. Sounds straightforward to me.

msg228087 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 16:57

--> s1 = set([1]) --> s2 = set([1, 2]) --> s3 = set([1, 2, 3]) --> s4 = set([2]) --> s5 = set([2, 3]) --> s6 = set([3])

--> l = [s1, s2, s3, s4, s5, s6] --> sorted(l) [{1}, {2}, {1, 2}, {3}, {2, 3}, {1, 2, 3}]

--> s1 < s4 False

--> s4 < s2 True

--> s1 < s2 True

--> s4 < s6 False

--> l = [s1, s4] --> sorted(l) [{1}, {2}]

--> sorted(l) [{1}, {2}]

Looks horribly messy to me. In the last example we can see that neither s1 nor s4 are smaller, yet s1 is consistently put first.

On the other hand, I suppose it's okay for Counter to have this behavior since it's already in sets.

msg228088 - (view)

Author: Steven D'Aprano (steven.daprano) * (Python committer)

Date: 2014-10-01 16:59

Ethan said:

If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language?

It's a multi-set, a general purpose and fairly fundamental data type.

https://en.wikipedia.org/wiki/Set_%28abstract_data_type%29#Multiset

And later:

Curiousity question: What happens if you try to sort a list of partially ordered Counters?

The same thing that happens when you sort a list of any partially ordered objects, such as sets:

py> sorted([{1, 2, 3, 4}, {2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}]) [{2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}, {1, 2, 3, 4}]

You get some order, but since sorting assumes a total order, not just partial order, the result isn't really meaningful, and will very likely depend on the initial order of the items. If that worries you, then don't sort items that implement only a partial order.

msg228089 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 17:10

I'll go with +0.5. :)

If this goes in, I think a missing key in either Counter should default to 0 for purposes of the ordering.

msg228090 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-01 17:13

If/when there's general agreement that this functionality should be merged in (assuming the implementation is acceptable), let me know and I'll be happy to write the code and tests.

msg228094 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-10-01 17:18

I'll go with +0.5. :)

I was going to make a joke about Counters only accepting integral values, but the constructor is actually quite laxist:

Counter({'a': 2.5}) Counter({'a': 2.5}) Counter({'a': 2.5 + 1j}) Counter({'a': (2.5+1j)}) Counter({'a': 'b'}) Counter({'a': 'b'})

(!)

msg228096 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 17:21

That does seem odd -- how can you have 'b' of something?

msg228097 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-01 17:24

The real problem is that Counter was put into the standard library with such lenient conditions on how it should be used-- basically looking at it as a dict subclass rather than a counter, putting emphasis on implementation rather than purpose, which was a mistake that we'll now have to deal with forever.

msg228098 - (view)

Author: Stefan Behnel (scoder) * (Python committer)

Date: 2014-10-01 17:28

That does seem odd -- how can you have 'b' of something?

Don't ask this kind of question as long as any math guys are listening.

But seriously, I consider this proposal reasonable. If the problem is that the counter values can be non-integers, then why not just raise an exception if someone tries to test the ordering with illegal (i.e. unordered or uncomparable) values? Data quality issues are best handled on user side.

msg228099 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-01 17:31

Stefan: If you'd want to raise an exception, you'll need to define in which cases. Should it raise an exception for decimal numbers? Complex numbers? etc.

Personally I'd enjoy it raising exceptions in as many situations as possible (so we could keep Counter usage focused to actual counter usage and not just wild dicts) but I doubt it'll sit well with people who might use Counters with weird values.

msg228100 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 17:35

I have to disagree. The intent is clearly expressed in the docs [1]. However, if I have a need to deal with partial amounts (say, 2.5 apples because I gave half of one to my horse ;), Counter will still work with that:

--> treats = Counter({'carrots':12, 'apples':3, 'sugar_cubes':100}) --> treats Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 3}) --> treats['apples'] -= 0.5 --> treats Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 2.5})

At least, it will until we fix that bug. ;)

msg228101 - (view)

Author: Stefan Behnel (scoder) * (Python committer)

Date: 2014-10-01 17:36

I'd raise an exception whenever the values of the Counter turn out to be uncomparable. If users manage to fill Counters with non-number values that are comparable amongst each other, or even if each key has a value of a different type, why not just support that?

All that is really required for this is that you can compare a) keys for equality and b) values of identical keys for ordering, right?

msg228102 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-01 17:38

Ah, now I understand you Stefan. That sounds like a good scheme; let any comparison errors between the values simply propagate up.

msg228105 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-01 17:51

[1] https://docs.python.org/3/library/collections.html#collections.Counter

msg228236 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-02 16:15

I, myself, wrote:

At least, it will until we fix that bug. ;)

Happily, reading the whole documentation revealed that non-integer values are allowed, so it's not a bug. :)

msg228312 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-03 09:08

Attached patch of implementation with tests. I haven't actually run this (because I don't know how to run Python's test suite) so there are likely to be bugs. if there's general agreement about this direction I'll figure out how to run the test so I could ensure it works.

msg228374 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-03 20:47

Fixed patch attached. Still needs someone to run it and see whether there are any bugs.

msg228391 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-03 22:05

Testing the patch...

msg228398 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-03 22:24

In going through the tests I have found an edge-case that I am unsure of how to handle:

Counter(a=1) < Counter(a=1, b=1)

On the one hand, they both have the same value for 'a'; on the other hand, the second counter has more elements...

So should the result be False or True?

msg228401 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-10-03 22:25

Counter(a=1) < Counter(a=1, b=1)

Do an analogy with sets and the answer will be obvious.

msg228402 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-03 22:29

I am not a mathematician -- feel free to include the answer with your hints. ;)

If my analogy is correct, this situation is similar to

{1} < {1, 2}

Which is True. Can someone verify that I am correct?

msg228403 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-03 22:30

Yes, correct.

msg228411 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-03 23:53

New patch attached.

msg228429 - (view)

Author: Saimadhav Heblikar (Saimadhav.Heblikar) *

Date: 2014-10-04 06:57

I mentioned my wording for the comment about lesser than partial ordering using the code review tool.

In the attached patch(based on .stoneleaf.patch.01), I have added test for the case when the other object is not a mapping.

I have also tried to refactor the le and ge methods.

Please comment on the changes. Regards

msg228433 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-04 07:38

I don't really understand the refactoring. For example why is len(self) <= len(other) needed? For which case? Note that len would also catch keys with 0 value which shouldn't have an effect.

msg228436 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2014-10-04 07:57

There are some properties of set comparison:

(a < b) == (a <= b and a != b) (a <= b) == (a < b or a == b) (a <= b and b <= a) == (a == b) (a < b and b < a) == False (a <= b) == (a - b == set()) if (a <= b and b <= c) then (a <= c) (a <= b and a <= c) == (a <= (b & c)) (a <= c and b <= c) == ((a | b) <= c)

Is this true for Counter?

msg228440 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-04 08:38

There are some properties of set comparison:

(a < b) == (a <= b and a != b)

I don't think so, because counter equality is based on dict equality so it doesn't ignore zero values.

(a <= b) == (a < b or a == b)

No, ditto.

(a <= b and b <= a) == (a == b)

No, ditto.

(a < b and b < a) == False

Yes.

(a <= b) == (a - b == set())

Yes.

if (a <= b and b <= c) then (a <= c)

Yes.

(a <= b and a <= c) == (a <= (b & c))

Yes.

(a <= c and b <= c) == ((a | b) <= c)

Yes.

msg228441 - (view)

Author: Steven D'Aprano (steven.daprano) * (Python committer)

Date: 2014-10-04 09:58

On Sat, Oct 04, 2014 at 07:57:16AM +0000, Serhiy Storchaka wrote:

There are some properties of set comparison:

(a < b) == (a <= b and a != b) (a <= b) == (a < b or a == b) (a <= b and b <= a) == (a == b) (a < b and b < a) == False (a <= b) == (a - b == set()) if (a <= b and b <= c) then (a <= c) (a <= b and a <= c) == (a <= (b & c)) (a <= c and b <= c) == ((a | b) <= c)

Is this true for Counter?

I believe these are all true for multisets.

But Counter is not just a multiset (a.k.a. bag) since the "counts" are not limited to non-negative integers (the natural numbers).

py> C = Counter() py> C['a'] = -3 py> C['b'] = 'spam' py> C['c'] = 0.5 py> C Counter({'c': 0.5, 'b': 'spam', 'a': -3})

I don't know of any standard definition for subset and superset for multisets like C above. I think that we should raise an exception in cases like that: TypeError for cases where any "count" is non-numeric, and ValueError for cases where any count is not a non-negative integer.

(Later, if somebody comes up with a sensible meaning for sub/superset of such a Counter, we can relax that restriction.)

Another issue: are missing elements treated as if they have count 0? E.g. what are these?

Counter({'a': 0, 'b': 1}) <= Counter({'a': 1, 'b': 1}) Counter({'a': 0, 'b': 1}) <= Counter({'b': 1})

According to these pages:

http://singularcontiguity.wordpress.com/2008/05/07/multiset-equality-and-submultisets/ http://singularcontiguity.wordpress.com/2008/04/28/multiset-membership/

I would argue that they should both return True (based on the idea that the "support" of a multiset is the set of those elements with non-zero count). But I don't know how authoritative that blog is. (The author even mentions that he only has a passing familiarity with some aspects of multisets.)

Anyone familiar with Haskell? What does this bag class do?

http://hackage.haskell.org/package/uulib-0.9.8/docs/UU-DData-IntBag.html

This may also be helpful. Sets and bags in Z language:

http://www.cs.ox.ac.uk/people/michael.wooldridge/teaching/soft-eng/lect08.pdf http://www.cs.ox.ac.uk/people/michael.wooldridge/teaching/soft-eng/lect14.pdf

msg228442 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2014-10-04 10:08

At least this condition is false for current implementation:

(a < b and b < a) == False

Example: a = Counter({'a': -1}), b = Counter({'b': -1}).

msg228443 - (view)

Author: Steven D'Aprano (steven.daprano) * (Python committer)

Date: 2014-10-04 10:13

On Sat, Oct 04, 2014 at 08:38:17AM +0000, Ram Rachum wrote:

Ram Rachum added the comment:

There are some properties of set comparison:

(a < b) == (a <= b and a != b)

I don't think so, because counter equality is based on dict equality so it doesn't ignore zero values.

That's a very good point! That suggests that we shouldn't treat Counters as multisets (a.k.a. bags) and define subset/superset operators. They're almost multisets, but not quite the same, since:

Counter({'a': 0}) != Counter({})

but the formal definition of multiset suggests those should be equal:

https://en.wikipedia.org/wiki/Multiset#Formal_definition

Perhaps we should rethink this. I'm starting to wonder whether we need a PEP to define exactly how subset and superset should be defined for Counters. This is precisely the trouble with rushing to add code before we know what the code is supposed to do... :-)

msg228444 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-10-04 10:43

Just because something is discussed doesn't mean it needs a PEP. Also, see http://bugs.python.org/issue22515#msg227852

msg228445 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-04 10:43

I see that in my implementation for lt and gt I forgot to check whether the other counter has elements that this counter doesn't, which could flip found_strict_difference.

msg228461 - (view)

Author: Steven D'Aprano (steven.daprano) * (Python committer)

Date: 2014-10-04 15:24

On Sat, Oct 04, 2014 at 10:43:02AM +0000, Antoine Pitrou wrote:

Just because something is discussed doesn't mean it needs a PEP.

I know that in general discussions do not need a PEP, but we seem to be rushing awfully fast into writing code before we even know what the code is supposed to do. Perhaps "need a PEP" was too strong, but we surely need to do something to decide on constistent and well-defined behaviour. As you pointed out earlier, Counters aren't precisely multisets, they're more like a hybrid mapping/multiset. It's not clear what behaviour subset and superset should have for such a hybrid. By rushing into code before deciding on what the code is supposed to do, we end up with inconsistent behaviour.

Here are a pair of Counters which compare "is subset", but not "is subset or equal" using Ram's patch:

py> Counter(a=1, b=0) < Counter(a=2) True py> Counter(a=1, b=0) <= Counter(a=2) False

On the other hand, here's a pair which compares "is subset or equal", but not ("is subset" or "equal"):

py> Counter(a=0) <= Counter(b=0) True py> (Counter(a=0) < Counter(b=0)) or (Counter(a=0) == Counter(b=0)) False

Adding elements with count=0 changes whether a set is a subset or not:

py> Counter(a=0) < Counter(b=0) False py> Counter(a=0) < Counter(b=0, c=0) True py> Counter(a=0, b=0) < Counter(b=0, c=0) False

I'm not convinced that mixed Counter and regular dict comparisons should be supported, but Ram's patch supports any Mapping type. Is that a good idea? There's been no discussion on that question.

Can somebody explain to me what this means? How is this meaningfully a subset operation?

py> Counter(a="eggs") < Counter(a="spam") True

I supported adding these operations to Counter, but the last thing I want is to enshrine a set of ad-hoc and inconsistent semantics that don't match standard multiset behaviour and are defined by the implementation, instead of having the implementation defined by the desired semantics.

msg228462 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-10-04 15:39

Le 04/10/2014 17:24, Steven D'Aprano a Γ©crit :

I know that in general discussions do not need a PEP, but we seem to be rushing awfully fast into writing code before we even know what the code is supposed to do.

Writing a patch is useful practice because it's a way to discover latent problems (especially when writing unit tests).

py> Counter(a=1, b=0) < Counter(a=2) True py> Counter(a=1, b=0) <= Counter(a=2) False

That looks wrong to me.

py> Counter(a=0) <= Counter(b=0) True py> (Counter(a=0) < Counter(b=0)) or (Counter(a=0) == Counter(b=0)) False

Ditto.

py> Counter(a=0) < Counter(b=0) False py> Counter(a=0) < Counter(b=0, c=0) True

Yuck.

I'm not convinced that mixed Counter and regular dict comparisons should be supported, but Ram's patch supports any Mapping type. Is that a good idea? There's been no discussion on that question.

That sounds wrong to me as well.

Can somebody explain to me what this means? How is this meaningfully a subset operation?

py> Counter(a="eggs") < Counter(a="spam") True

I agree this looks silly. However, the Counter doc says:

"""The multiset methods are designed only for use cases with positive values. [...] There are no type restrictions, but the value type needs to support addition, subtraction, and comparison."""

Which really sounds like an excuse to not perform any type checking and silently allow nonsensical multiset behaviour with non-integer keys. We can follow that lead :-)

msg228463 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-10-04 15:54

Hi everybody,

I agree we have a mess on our hands, with lots of cases where comparison is weird. To be honest I've given up on collections.Counter at this point. The fact that it's a "hybrid mapping/multiset" is the cause of all the problems, so I just made my own class that is defined as a multiset with only positive integer values, and I'm going to use that and define methods on it without having to worry about these weird cases. Unless someone else wants to champion this issue and resolve the edge cases, it can be closed as far as I'm concerned.

I really wish that collections.Counter wouldn't have been added to the standard library as a "hybrid mapping/multiset"; I think that it shows emphasis on the implementation rather than the usage, while the usage is more important. But there's nothing I can do about that. (If there was a place where we can make notes for Python 4, I'd put a note there about it.)

msg228477 - (view)

Author: Ethan Furman (ethan.furman) * (Python committer)

Date: 2014-10-04 18:43

We are not "rushing into code", we are getting some code, and tests, written to help define what we are talking about.

So far the feedback has given some more tests to add, and some things to change (such as only comparing with other Counters).

The big question I have is how do we handle <= and >= ? Normal Counter == treats a key with a zero value as being unequeal to a Counter without that same key, but I don't think that makes sense for partial ordering.

msg228487 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2014-10-04 19:39

This implementation obeys more set properties.

msg229065 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2014-10-11 07:37

I suggest implementing Counter.__lt__ which will be a partial order, similarly to set.__lt__.

This would be reasonable if the counter were a strict stand-alone multiset or bag; however, the core design is much looser than that (supporting negative counts for example).

It would be more accurate to think about the Counter as a dict that returns 0 for missing keys and has few extra methods that can be handy in the content of counting things.

Accordingly, a partial ordering operator wouldn't be well-defined for some of the intended use cases.

The addition of subset/superset operations was considered and rejected during the original design phase (the omission was intentional). It is better to leave this to the user to decide what they want to do and how they intend to do so. After all, it is as easily to work with as a regular dict. You can access it directly or subclass it to fit your own needs.

The starting point for this class (when first proposed and endorsed by Guido) was:

class Counter(dict): def missing(self, key): return 0

That core idea is dirt simple and it is not hard to build your own variants if the one in collections doesn't meet your needs.

msg231365 - (view)

Author: Ram Rachum (cool-RR) *

Date: 2014-11-19 10:27

Hi everyone,

Just wanted to follow up here that I've created my own dedicated bag class.

Documentation: https://combi.readthedocs.org/en/stable/bags.html

Install using pip install combi

It's packed with features. There are a ton of arithmetic operations defined and the comparisons methods we talked about in this issue. There are also frozen and ordered variants.

msg253251 - (view)

Author: Aaron Meurer (Aaron.Meurer)

Date: 2015-10-20 20:19

I can't believe this issue was closed. Why can't Counter.lt(self, other) just be all(self[i] < other[i] for i in self)?

Just because Counter supports weird stuff shouldn't bill it out. To follow that logic, we should also remove Counter.subtract

Counter(a='b') - Counter(a='c') Traceback (most recent call last): File "", line 1, in Counter(a='b') - Counter(a='c') File "/Users/aaronmeurer/anaconda/lib/python3.5/collections/init.py", line 709, in sub newcount = count - other[elem] TypeError: unsupported operand type(s) for -: 'str' and 'str'

It's super annoying that Counter supports all the basic set operations except subset. A reminder that this does work:

Counter(a=2) | Counter(a=1) Counter({'a': 2})

And also fails on weird values:

Counter(a='b') | Counter(a='c') Traceback (most recent call last): File "", line 1, in Counter(a='b') | Counter(a='c') File "/Users/aaronmeurer/anaconda/lib/python3.5/collections/init.py", line 730, in or if newcount > 0: TypeError: unorderable types: str() > int()

History

Date

User

Action

Args

2022-04-11 14:58:08

admin

set

github: 66705

2016-09-28 09πŸ”ž28

serhiy.storchaka

link

issue28296 superseder

2015-10-20 20:19:07

Aaron.Meurer

set

nosy: + Aaron.Meurer
messages: +

2014-11-19 10:27:17

cool-RR

set

messages: +

2014-10-11 07:37:32

rhettinger

set

status: open -> closed
resolution: rejected
messages: +

2014-10-11 06:50:31

rhettinger

set

assignee: rhettinger

2014-10-04 19:39:37

serhiy.storchaka

set

files: + issue22515_partial_order_counter_v3.diff

messages: +

2014-10-04 18:43:58

ethan.furman

set

messages: +

2014-10-04 15:54:18

cool-RR

set

messages: +

2014-10-04 15:39:04

pitrou

set

messages: +

2014-10-04 15:24:23

steven.daprano

set

messages: +

2014-10-04 10:43:39

cool-RR

set

messages: +

2014-10-04 10:43:02

pitrou

set

messages: +

2014-10-04 10:13:25

steven.daprano

set

messages: +

2014-10-04 10:08:33

serhiy.storchaka

set

messages: +

2014-10-04 09:58:52

steven.daprano

set

messages: +

2014-10-04 09:00:35

mark.dickinson

set

nosy: + mark.dickinson

2014-10-04 08:38:17

cool-RR

set

messages: +

2014-10-04 07:57:16

serhiy.storchaka

set

messages: +

2014-10-04 07:38:04

cool-RR

set

messages: +

2014-10-04 06:57:55

Saimadhav.Heblikar

set

files: + issue22515_partial_order_counter_v2.diff
nosy: + Saimadhav.Heblikar
messages: +

2014-10-03 23:53:26

ethan.furman

set

files: + issue22515.stoneleaf.patch.01

messages: +

2014-10-03 22:30:32

cool-RR

set

messages: +

2014-10-03 22:29:52

ethan.furman

set

stage: test needed -> patch review

2014-10-03 22:29:39

ethan.furman

set

messages: +

2014-10-03 22:25:19

pitrou

set

messages: +

2014-10-03 22:24:14

ethan.furman

set

messages: +

2014-10-03 22:05:45

ethan.furman

set

messages: +

2014-10-03 20:47:56

cool-RR

set

files: + patch2.patch

messages: +

2014-10-03 09:08:19

cool-RR

set

files: + patch.patch
keywords: + patch
messages: +

2014-10-02 16:15:45

ethan.furman

set

messages: +

2014-10-01 17:51:33

ethan.furman

set

messages: +

2014-10-01 17:38:50

cool-RR

set

messages: +

2014-10-01 17:36:35

scoder

set

messages: +

2014-10-01 17:35:49

ethan.furman

set

messages: +

2014-10-01 17:31:52

cool-RR

set

messages: +

2014-10-01 17:28:42

scoder

set

nosy: + scoder
messages: +

2014-10-01 17:24:13

cool-RR

set

messages: +

2014-10-01 17:21:58

ethan.furman

set

messages: +

2014-10-01 17πŸ”ž48

pitrou

set

messages: +

2014-10-01 17:13:36

cool-RR

set

messages: +

2014-10-01 17:10:08

ethan.furman

set

messages: +
stage: test needed

2014-10-01 16:59:27

steven.daprano

set

messages: +

2014-10-01 16:57:58

ethan.furman

set

messages: +

2014-10-01 16:42:06

cool-RR

set

messages: +

2014-10-01 16:30:14

pitrou

set

messages: +

2014-10-01 16:29:04

ethan.furman

set

messages: +

2014-10-01 16:27:33

ethan.furman

set

messages: +

2014-10-01 16:04:59

pitrou

set

messages: +

2014-10-01 14:50:21

eric.smith

set

messages: +

2014-10-01 14:23:47

cool-RR

set

messages: +

2014-10-01 14πŸ”ž10

eric.smith

set

nosy: + eric.smith
messages: +

2014-09-29 23:21:30

cool-RR

set

messages: +

2014-09-29 23:17:52

ethan.furman

set

nosy: + rhettinger
type: enhancement
messages: +

2014-09-29 23:13:24

cool-RR

set

messages: +

2014-09-29 22:54:15

pitrou

set

nosy: + pitrou
messages: +

2014-09-29 22:44:47

cool-RR

set

messages: +

2014-09-29 22:36:39

ethan.furman

set

messages: +

2014-09-29 22:32:58

cool-RR

set

messages: +

2014-09-29 22:30:54

cool-RR

set

messages: +

2014-09-29 22:16:19

josh.r

set

nosy: + josh.r

2014-09-29 22:06:02

ethan.furman

set

messages: +

2014-09-29 21:24:53

cool-RR

set

messages: +

2014-09-29 21:21:14

r.david.murray

set

nosy: + r.david.murray
messages: +

2014-09-29 21:20:59

cool-RR

set

messages: +

2014-09-29 21:15:34

ethan.furman

set

nosy: + ethan.furman
messages: +

2014-09-29 20:35:26

serhiy.storchaka

set

messages: +

2014-09-29 20:30:54

cool-RR

set

messages: +

2014-09-29 20:26:45

cool-RR

set

messages: +

2014-09-29 20:24:23

serhiy.storchaka

set

nosy: + serhiy.storchaka
messages: +

2014-09-29 20:15:30

steven.daprano

set

nosy: + steven.daprano
messages: +

2014-09-29 19:33:21

cool-RR

create