msg115892 - (view) |
Author: Matthew Woodcraft (mattheww) |
Date: 2010-09-08 19:30 |
In CPython, the builtin max() and min() have the property that if there are items with equal keys, the first item is returned. From a quick look at their source, I think this is true for Jython and IronPython too. I propose making this a documented guarantee. On Python-dev, Raymond Hettinger said: << That seems like a reasonable request. This behavior has been around for a very long time is unlikely to change. Elsewhere, we've made efforts to document sort stability (i.e. sorted(), heapq.nlargest(), heapq.nsmallest, merge(), etc). >> (<http://mail.python.org/pipermail/python-dev/2010-September/103543.html>) I'm attaching a patch with a concrete suggestion for a change to functions.rst, modelled on the documentation of heapq.nlargest(). |
|
|
msg115898 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-09-08 20:52 |
Thanks for the patch! Comments: (1) Shouldn't 'reverse=True' be omitted in the second doc addition? (2) I'd also suggest adding a brief comment about what this means for distinct, but equal, objects; otherwise it's not really obvious what the point of the doc addition is. (3) As a matter of clarity, perhaps replace "this is" with "max(iterable, key=key) is", and similarly for min. As an aside, I still like Jeffrey Yasskin's suggestion on the python-dev mailing list that the sensible definition for max would maintain the invariant that max(iterable) be equivalent to sorted(iterable)[-1]; see Alexander Stepanov's writings in e.g., http://www.cs.rpi.edu/~musser/gsd/notes-on-programming-2006-10-13.pdf for more. But that's (a) another issue, and (b) perhaps not a significant enough benefit to be worth changing Python's semantics for. |
|
|
msg116183 - (view) |
Author: Matthew Woodcraft (mattheww) |
Date: 2010-09-12 12:44 |
> (1) Shouldn't 'reverse=True' be omitted in the second doc > addition? Yes, of course, sorry. > (2) I'd also suggest adding a brief comment about what this > means for distinct, but equal, objects; otherwise it's not > really obvious what the point of the doc addition is. > (3) As a matter of clarity, perhaps replace "this is" with > "max(iterable, key=key) is", and similarly for min. I've attached a new patch incorporating these suggestions. |
|
|
msg116186 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-09-12 12:52 |
> As an aside, I still like Jeffrey Yasskin's suggestion on the > python-dev mailing list that the sensible definition for max would > maintain the invariant that max(iterable) be equivalent to > sorted(iterable)[-1] What's interesting is the practical consequence that: x, y = min(x, y), max(x, y) cannot give you twice the same object. Of course, there are subtle implications of how it will be implemented (especially with objects which have a partial order relationship to each other). Since max() is supposed to work on any iterator, we probably don't want to build an intermediate sequence and fetch elements in reverse order; instead perhaps use (not Py_LT) instead of Py_GT. |
|
|
msg116344 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-09-13 19:37 |
> Of course, there are subtle implications of how it will be implemented Indeed. Ideally, as you mention, the implementation would only use __lt__ (as with sort and bisect). I think that constraint only leaves one reasonable choice: namely, max and min for multiple args would be functionally equivalent to max_list and min_list below: def min2(x, y): return y if y < x else x def max2(x, y): return x if y < x else y def min_list(xs): return reduce(min2, xs) def max_list(xs): return reduce(max2, xs) |
|
|
msg116425 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-09-14 23:14 |
Documented the current behavior in r84822. Mark, if you're free on IRC at some point, I would like to discuss further. |
|
|
msg122010 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-11-21 22:48 |
As discussed with Mark, am closing this one after having applied documentation changes. |
|
|
msg130049 - (view) |
Author: Stephen Evans (Stephen.Evans) |
Date: 2011-03-04 12:32 |
As suggested by Mark following my post on comp.lang.python I am adding further comments to the discussion on this (closed) issue. For a more mathematical consideration of the issue: Stepanov, Alexander and Paul McJones. 2009. Elements of Programming. Addison Wesley. Pages 52-53 The problem with the builtin max() is with weak comparisons. Consider two python objects a and b that are equivalent and where the following are True: a is not b repr([a, b]) == repr(sorted([a, b])) repr([a, b]) == repr(sorted([a, b], reverse=True)) repr([b, a]) == repr(sorted([b, a])) repr([b, a]) == repr(sorted([b, a], reverse=True)) Assuming repr() implemented correctly for a and b. The only Python rich comparison required is (weak) __lt__. If (weak) __eq__ is implemented then the following are True: a == b b == a In bltinmodule.c builtin_max() uses Py_GT. For correctness this should use the converse of builtin_min() i.e. the boolean negation of PyObject_RichCompare using Py_LT (for valid results). If using Python rich comparisions then only __lt__ would be required for both min() and max() as with list.sort(). The following will then be True: min([a, b]) is a max([a, b]) is b min([b, a]) is b max([b, a]) is a min([a, b]) is max([b, a]) min([a, b]) is not min([b, a]) max([a, b]) is min([b, a]) max([a, b]) is not max([b, a]) The above will work if Py_GE is subtituted for Py_GT in builtin_max(), though this will require the implementation of __ge__ which is inconsistent with list.sort() and is a point of potential failure if the implementation of __ge__ is not the converse of the implementation __lt__. To reiterate - builtin max() should be the converse of builtin min(). |
|
|