Issue 1766304: improve xrange.contains - Python tracker (original) (raw)
Created on 2007-08-02 16:14 by stargaming, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (35)
Author: Stargaming (stargaming)
Date: 2007-08-02 16:14
Membership tests for xrange result in loss of xrange's laziness by iterating through every single item. This yields a complexity of O(n) and an unbearable runtime when handling larger ranges.
This issue can be fixed by using arithmetic decision instead of generic lookup/iteration. This will boil down the complexity to O(1) and make tests near the start of the range as fast as those near the end. The same technique is already used in item lookups.
The fix has been inspired by the "optimizing [x]range" thread in the python-3000-devel mailing list (see http://article.gmane.org/gmane.comp.python.python-3000.devel/8732).
A patch to Objects/rangeobject.c is attached. It modifies tp_as_sequence in PyRange_Type to include the sq_contains field and implement the procedure range_contains right there.
Author: Stargaming (stargaming)
Date: 2007-08-06 11:37
The Python 3000 patch required a rewrite from scratch since we can't issue PyIntObject->ob_ival directly any longer. Instead, we use the PyNumber-API now. File Added: rangeobject.c.diff
Author: Antoine Pitrou (pitrou) *
Date: 2008-01-07 22:14
The 2.6 patch should probably use long instead of int.
Author: Alexander Belopolsky (belopolsky) *
Date: 2008-02-25 00:57
Do we really need another way to spell a <= x < b? Have you got a real-world use case in mind for the version with step > 1?
I'm at most lukewarm; I'd be willing to look at a patch to the C code in the py3k-struni branch, plus unit tests though.
-- GvR at http://thread.gmane.org/gmane.comp.python.python- 3000.devel/8732
I read this as a rejection for 2.x series. For py3k, this is a premature optimization. Current py3k implementation is likely to be optimized for regular size integers at some point. This patch will only introduce more code to be changed then.
If this is not ready to be rejected for 2.x, let's wait for resolution of since it may result in backporting of py3k range implementation.
Author: Stargaming (stargaming)
Date: 2008-02-29 19:53
Later on, when Greg mentions the current for/if performance asymmetry, Guido states:
Fair enough. So maybe you can contribute a patch?
I don't read this as a rejection, but of course you're right -- this patch is purely an optimization. As already written in my initial comment (and discussed on the mailing list), there would be no change in behaviour: The change from generic iterator behaviour to specific range performance is transparent to the end-user.
I don't see how the other patches interfere with this one. Waiting until the development of the range object itself has a stabilized and we decided upon a member type/API seems sensible. Still, this patch would be valid on its own.
Now, your impulse is right: having the performance enhancement in the bug tracker doesn't help much -- we need a resolution for this. If someone could review the patch, tell me about the critical parts or point me to references on how to improve it, I'd be really glad!
On the mentioned unit tests: I'm unsure how to verify this behaviour since I expect it to affect runtime only.
PS. With the new tracker, wouldn't the "resource usage" type fit best?
Author: Alexander Belopolsky (belopolsky) *
Date: 2008-02-29 21:33
Here are my comments on the py3k patch:
Sign of a PyLong object o is the same as the sign of Py_SIZE(o). I would think it is safe to use this fact within python core. (User code that may need to work across multiple versions of python may need to be more conservative.)
If you choose to use zero in the code, there is no need to create it twice per call.
Py_INCREF(zero); is unnecessary and creates a reference leak.
range_contains should return -1 when error is detected, not 0.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-02 21:10
Closed issue 6826 as a duplicate of this issue.
Author: Raymond Hettinger (rhettinger) *
Date: 2009-09-02 21:46
+1 for an O(1) contains-test.
Author: Joseph Thomson (hpesoj)
Date: 2009-09-02 21:47
Seeing as the range type has most likely stabalised by now, I would like to renew this issue. The current behaviour of range/xrange could introduce unforeseen performance issues if users are not aware of its inner workings.
Also, as I said in my closed duplicate issue, 'if value in range(lower, upper)' to me looks far more Pythonic than 'if value >= lower and value < upper'. Although this may be a minor point, I think it is noteworthy.
As for the prospect of producing a patch myself, I have checked out and taken a look at the code, but it is somewhat alien to me, and I thought it'd probably be more sensible to leave it to the people who know what they're doing.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-03 09:18
The py3k patch no longer works: it makes use of PyObject_Cmp, which no longer exists.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-03 10:03
The trunk patch is also unacceptable in its current form:
there are no docs or tests
keyval, start, step and end should have type long, not type int (as Antoine already mentioned)
the expression ((keyval - start) % step) can overflow, leading to undefined behaviour (e.g., wrong results, segfaults, strange effects from gcc optimizations that assume no overflow). For example, with the patch, on Linux/x86-64, I get:
x = xrange(-2000000000, 2000000000, 5) 1000000000 in x False
This should be relatively easy to fix: e.g., if you already know that step > 0 and start <= keyval and keyval < stop, then '(unsigned long)keyval - (unsigned long)start' is safe from overflow.
the containment check only works for ints: with the patch, I get:
x = xrange(10) 4 in x True 4L in x False 4.0 in x False
but without the patch applied, all these return True. It's possible that it's worth special-casing integer inputs for the sake of speed, but I don't think the behaviour should change like this for other types.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-03 10:06
[Joseph Thomson]
Also, as I said in my closed duplicate issue, 'if value in range(lower, upper)' to me looks far more Pythonic than 'if value >= lower and value < upper'.
Note that the Pythonic spelling would be: 'if lower <= value < upper'. (Though that's not quite the same thing if value is not an integer.)
Author: Robert Lehmann (lehmannro) *
Date: 2009-09-20 16:37
I revised the patch for Python 3.1 and added notices to Misc/NEWS and the range documentation.
(Changing Type to resource usage.)
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-21 09:39
Robert,
The patch looks good: thank you.
Please use C89-style comments (/* ... */).
I'd like to see a few more tests covering the various combinations of start less-than/equal-to/greater-than stop, step positive/negative, tested value within/without/at endpoints of the range interval, etc.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-21 09:49
Also, it would be good to add a test or two for non-integers, e.g. to make explicit that the following behaviour hasn't changed:
class C: ... def int(self): return 3 ... def index(self): return 5 ... def eq(self, other): return other == 4 ... C() in range(0, 10, 2) True
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2009-09-21 11:42
As a small style note, I would prefer if the patch assigned in conditionals less and split them out to the line before. I see that rangeobject.c has a mixed style with regards to this, so the clearer one should win! :)
Author: Robert Lehmann (lehmannro) *
Date: 2009-09-21 22:27
Thanks for your feedback. I added a few tests and changed the bits you criticized.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-22 19:49
Great---thanks!
Would it be worth using PyObject_IsTrue instead of comparing with zero using PyObject_RichCompareBool? (Sorry; should have spotted this last time around.)
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-22 21:48
Answering myself: probably not worth it: I failed to notice that you already need zero for the 'step > 0' comparison.
Applied in r75028.
Leaving open for consideration of 2.x xrange.
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-24 20:05
Just to be on the safe side, I changed the PyLong_Check(ob) check to PyLong_CheckExact(ob) || PyBool_Check(ob), in r75051. Without this, one can get different results for 'x in range(10)' and 'x in list(range(10))' if x is an instance of a subclass of int:
Python 3.2a0 (py3k:75050, Sep 24 2009, 20:56:13) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information.
class C(int): ... def eq(self, other): return True ... C(11) in range(10) False C(11) in list(range(10)) True
Author: Mark Dickinson (mark.dickinson) *
Date: 2009-09-24 20:15
Unassigning myself since I don't intend to do anything myself about xrange.contains in 2.x. (But I'll happily review patches.)
Author: Mark Dickinson (mark.dickinson) *
Date: 2010-04-16 08:00
I suggest closing this: it's been implemented (for range) in Python 3.x, and I think it's too late for 2.x now.
Author: Raymond Hettinger (rhettinger) *
Date: 2010-04-16 16:20
As an implementation detail, it isn't too late to put this into 2.x if it is something we care about.
Author: Tal Einat (taleinat) *
Date: 2010-06-02 14:33
I'd like to implement this for 2.x, if there's any chance of this being accepted. Is there still a chance of getting this into 2.7?
This will be my first patch for the Python core, so please tell me if I'm missing something. Currently I'm thinking of:
- starting with the original patch
- applying the required changes as mentioned Mark Dickinson, looking at the the accepted 3.x patch for guidance
- back-porting the tests
- checking for later additional changes in the source history, and back-porting any such relevant changes
Author: Mark Dickinson (mark.dickinson) *
Date: 2010-06-02 15:02
A rapidly diminishing chance for 2.7, IMO. I wouldn't want to try to add this after the first release candidate, which is scheduled for June 5 (and I'm not sure I'll have time to review a patch between now and then).
On the other hand, since this is pure optimization it might be acceptable for 2.7.1.
Author: Tal Einat (taleinat) *
Date: 2010-06-02 15:17
I'd really like to have this in 2.7. Is there no chance of getting it into 2.7 after rc1 is released?
If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.
Author: Mark Dickinson (mark.dickinson) *
Date: 2010-06-02 15:34
Benjamin?
I'd really like to have this in 2.7. Is there no chance of getting it into 2.7 after rc1 is released?
Author: Alexander Belopolsky (belopolsky) *
Date: 2010-06-02 18:02
On Wed, Jun 2, 2010 at 11:17 AM, Tal Einat <report@bugs.python.org> wrote: ..
If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.
I'll review your patch, but why are you so eager to get this into 2.7? You realize that for integer x, x in xrange(a, b) is just another way to spell a <= x < b. What is your use case?
Author: Antoine Pitrou (pitrou) *
Date: 2010-06-02 18:07
If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.
I'll review your patch, but why are you so eager to get this into 2.7? You realize that for integer x, x in xrange(a, b) is just another way to spell a <= x < b. What is your use case?
I really think this shouldn't go into 2.7. We certainly have better things to do than potentially break something for an almost useless non-feature.
Author: Antoine Pitrou (pitrou) *
Date: 2010-06-02 18:07
Rescheduling to 3.2, if applicable (otherwise I suggest closing).
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2010-06-02 18:09
I don't really see the point. I would be more inclined towards it if there was a patch already, but patching this doesn't strike me as a key feature.
Author: Alexander Belopolsky (belopolsky) *
Date: 2010-06-02 18:10
This is already in 3.x. The only reason I can think of to get this in 2.7 is to have fewer performance surprises between 2.x and 3.x.
Author: Antoine Pitrou (pitrou) *
Date: 2010-06-02 18:18
Le mercredi 02 juin 2010 ร 18:10 +0000, Alexander Belopolsky a รฉcrit :
This is already in 3.x. The only reason I can think of to get this in 2.7 is to have fewer performance surprises between 2.x and 3.x.
Since you are supposed to forward-port from 2.x and 3.x, the surprise will be a good one anyway.
Author: Tal Einat (taleinat) *
Date: 2010-06-03 07:30
In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.
This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.
I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2010-06-03 17:28
2010/6/3 Tal Einat <report@bugs.python.org>:
Tal Einat <taleinat@users.sourceforge.net> added the comment:
In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.
This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.
I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.
xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late.
History
Date
User
Action
Args
2022-04-11 14:56:25
admin
set
github: 45269
2010-06-03 17:32:05
benjamin.peterson
set
status: open -> closed
resolution: accepted
2010-06-03 17:28:45
benjamin.peterson
set
messages: +
2010-06-03 07:30:14
taleinat
set
messages: +
versions: + Python 2.7, - Python 3.2
2010-06-02 18๐40
pitrou
set
messages: +
2010-06-02 18:10:16
belopolsky
set
messages: +
2010-06-02 18:09:09
benjamin.peterson
set
messages: +
2010-06-02 18:07:29
pitrou
set
messages: +
versions: + Python 3.2, - Python 2.7
2010-06-02 18:07:01
pitrou
set
messages: +
2010-06-02 18:03:23
belopolsky
set
assignee: belopolsky
2010-06-02 18:02:59
belopolsky
set
messages: +
2010-06-02 15:34:33
mark.dickinson
set
messages: +
2010-06-02 15:17:31
taleinat
set
messages: +
2010-06-02 15:02:42
mark.dickinson
set
messages: +
2010-06-02 14:33:46
taleinat
set
nosy: + taleinat
messages: +
2010-04-16 16:20:07
rhettinger
set
priority: normal -> low
messages: +
2010-04-16 08:00:37
mark.dickinson
set
versions: - Python 3.2
2010-04-16 08:00:13
mark.dickinson
set
messages: +
2010-04-16 07:54:11
abacabadabacaba
set
nosy: + abacabadabacaba
2009-09-24 20:16:02
mark.dickinson
set
assignee: mark.dickinson -> (no value)
2009-09-24 20:15:44
mark.dickinson
set
messages: +
2009-09-24 20:05:44
mark.dickinson
set
messages: +
2009-09-22 21:48:00
mark.dickinson
set
messages: +
2009-09-22 19:49:49
mark.dickinson
set
messages: +
2009-09-21 22:27:11
lehmannro
set
files: + range.patch
messages: +
2009-09-21 11:42:35
benjamin.peterson
set
nosy: + benjamin.peterson
messages: +
2009-09-21 09:49:33
mark.dickinson
set
assignee: mark.dickinson
messages: +
2009-09-21 09:39:11
mark.dickinson
set
messages: +
2009-09-20 16:37:41
lehmannro
set
files: + range.patch
nosy: + lehmannro
messages: +
type: enhancement -> resource usage
2009-09-03 10:06:55
mark.dickinson
set
messages: +
2009-09-03 10:03:16
mark.dickinson
set
messages: +
2009-09-03 09๐18
mark.dickinson
set
messages: +
2009-09-02 21:47:11
hpesoj
set
messages: +
2009-09-02 21:46:11
rhettinger
set
nosy: + rhettinger
messages: +
2009-09-02 21๐42
mark.dickinson
set
nosy: + hpesoj
2009-09-02 21:12:12
mark.dickinson
set
stage: needs patch
type: enhancement
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0
2009-09-02 21:10:45
mark.dickinson
set
nosy: + mark.dickinson
messages: +
2009-09-02 21:09:58
mark.dickinson
link
2008-02-29 21:33:16
belopolsky
set
messages: +
2008-02-29 19:53:15
stargaming
set
messages: +
2008-02-25 00:57:26
belopolsky
set
nosy: + belopolsky
messages: +
2008-01-07 22:14:34
pitrou
set
nosy: + pitrou
messages: +
2008-01-06 22:29:45
admin
set
keywords: - py3k
versions: Python 2.6, Python 3.0
2007-08-30 00:24:23
gvanrossum
set
versions: + Python 3.0
2007-08-23 22:00:55
georg.brandl
set
keywords: + py3k
2007-08-02 16:14:47
stargaming
create