Issue 1766304: improve xrange.contains - Python tracker (original) (raw)

Created on 2007-08-02 16:14 by stargaming, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (35)

msg52979 - (view)

Author: Stargaming (stargaming)

Date: 2007-08-02 16:14

Membership tests for xrange result in loss of xrange's laziness by iterating through every single item. This yields a complexity of O(n) and an unbearable runtime when handling larger ranges.

This issue can be fixed by using arithmetic decision instead of generic lookup/iteration. This will boil down the complexity to O(1) and make tests near the start of the range as fast as those near the end. The same technique is already used in item lookups.

The fix has been inspired by the "optimizing [x]range" thread in the python-3000-devel mailing list (see http://article.gmane.org/gmane.comp.python.python-3000.devel/8732).

A patch to Objects/rangeobject.c is attached. It modifies tp_as_sequence in PyRange_Type to include the sq_contains field and implement the procedure range_contains right there.

msg52980 - (view)

Author: Stargaming (stargaming)

Date: 2007-08-06 11:37

The Python 3000 patch required a rewrite from scratch since we can't issue PyIntObject->ob_ival directly any longer. Instead, we use the PyNumber-API now. File Added: rangeobject.c.diff

msg59501 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2008-01-07 22:14

The 2.6 patch should probably use long instead of int.

msg62955 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2008-02-25 00:57

Do we really need another way to spell a <= x < b? Have you got a real-world use case in mind for the version with step > 1?

I'm at most lukewarm; I'd be willing to look at a patch to the C code in the py3k-struni branch, plus unit tests though.

-- GvR at http://thread.gmane.org/gmane.comp.python.python- 3000.devel/8732

I read this as a rejection for 2.x series. For py3k, this is a premature optimization. Current py3k implementation is likely to be optimized for regular size integers at some point. This patch will only introduce more code to be changed then.

If this is not ready to be rejected for 2.x, let's wait for resolution of since it may result in backporting of py3k range implementation.

msg63143 - (view)

Author: Stargaming (stargaming)

Date: 2008-02-29 19:53

Later on, when Greg mentions the current for/if performance asymmetry, Guido states:

Fair enough. So maybe you can contribute a patch?

I don't read this as a rejection, but of course you're right -- this patch is purely an optimization. As already written in my initial comment (and discussed on the mailing list), there would be no change in behaviour: The change from generic iterator behaviour to specific range performance is transparent to the end-user.

I don't see how the other patches interfere with this one. Waiting until the development of the range object itself has a stabilized and we decided upon a member type/API seems sensible. Still, this patch would be valid on its own.

Now, your impulse is right: having the performance enhancement in the bug tracker doesn't help much -- we need a resolution for this. If someone could review the patch, tell me about the critical parts or point me to references on how to improve it, I'd be really glad!

On the mentioned unit tests: I'm unsure how to verify this behaviour since I expect it to affect runtime only.

PS. With the new tracker, wouldn't the "resource usage" type fit best?

msg63146 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2008-02-29 21:33

Here are my comments on the py3k patch:

  1. Sign of a PyLong object o is the same as the sign of Py_SIZE(o). I would think it is safe to use this fact within python core. (User code that may need to work across multiple versions of python may need to be more conservative.)

  2. If you choose to use zero in the code, there is no need to create it twice per call.

  3. Py_INCREF(zero); is unnecessary and creates a reference leak.

  4. range_contains should return -1 when error is detected, not 0.

msg92185 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-02 21:10

Closed issue 6826 as a duplicate of this issue.

msg92187 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2009-09-02 21:46

+1 for an O(1) contains-test.

msg92188 - (view)

Author: Joseph Thomson (hpesoj)

Date: 2009-09-02 21:47

Seeing as the range type has most likely stabalised by now, I would like to renew this issue. The current behaviour of range/xrange could introduce unforeseen performance issues if users are not aware of its inner workings.

Also, as I said in my closed duplicate issue, 'if value in range(lower, upper)' to me looks far more Pythonic than 'if value >= lower and value < upper'. Although this may be a minor point, I think it is noteworthy.

As for the prospect of producing a patch myself, I have checked out and taken a look at the code, but it is somewhat alien to me, and I thought it'd probably be more sensible to leave it to the people who know what they're doing.

msg92195 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-03 09:18

The py3k patch no longer works: it makes use of PyObject_Cmp, which no longer exists.

msg92197 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-03 10:03

The trunk patch is also unacceptable in its current form:

  1. there are no docs or tests

  2. keyval, start, step and end should have type long, not type int (as Antoine already mentioned)

  3. the expression ((keyval - start) % step) can overflow, leading to undefined behaviour (e.g., wrong results, segfaults, strange effects from gcc optimizations that assume no overflow). For example, with the patch, on Linux/x86-64, I get:

    x = xrange(-2000000000, 2000000000, 5) 1000000000 in x False

    This should be relatively easy to fix: e.g., if you already know that step > 0 and start <= keyval and keyval < stop, then '(unsigned long)keyval - (unsigned long)start' is safe from overflow.

  4. the containment check only works for ints: with the patch, I get:

    x = xrange(10) 4 in x True 4L in x False 4.0 in x False

    but without the patch applied, all these return True. It's possible that it's worth special-casing integer inputs for the sake of speed, but I don't think the behaviour should change like this for other types.

msg92198 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-03 10:06

[Joseph Thomson]

Also, as I said in my closed duplicate issue, 'if value in range(lower, upper)' to me looks far more Pythonic than 'if value >= lower and value < upper'.

Note that the Pythonic spelling would be: 'if lower <= value < upper'. (Though that's not quite the same thing if value is not an integer.)

msg92899 - (view)

Author: Robert Lehmann (lehmannro) *

Date: 2009-09-20 16:37

I revised the patch for Python 3.1 and added notices to Misc/NEWS and the range documentation.

(Changing Type to resource usage.)

msg92926 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-21 09:39

Robert,

The patch looks good: thank you.

Please use C89-style comments (/* ... */).

I'd like to see a few more tests covering the various combinations of start less-than/equal-to/greater-than stop, step positive/negative, tested value within/without/at endpoints of the range interval, etc.

msg92927 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-21 09:49

Also, it would be good to add a test or two for non-integers, e.g. to make explicit that the following behaviour hasn't changed:

class C: ... def int(self): return 3 ... def index(self): return 5 ... def eq(self, other): return other == 4 ... C() in range(0, 10, 2) True

msg92934 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2009-09-21 11:42

As a small style note, I would prefer if the patch assigned in conditionals less and split them out to the line before. I see that rangeobject.c has a mixed style with regards to this, so the clearer one should win! :)

msg92967 - (view)

Author: Robert Lehmann (lehmannro) *

Date: 2009-09-21 22:27

Thanks for your feedback. I added a few tests and changed the bits you criticized.

msg93014 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-22 19:49

Great---thanks!

Would it be worth using PyObject_IsTrue instead of comparing with zero using PyObject_RichCompareBool? (Sorry; should have spotted this last time around.)

msg93019 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-22 21:48

Answering myself: probably not worth it: I failed to notice that you already need zero for the 'step > 0' comparison.

Applied in r75028.

Leaving open for consideration of 2.x xrange.

msg93091 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-24 20:05

Just to be on the safe side, I changed the PyLong_Check(ob) check to PyLong_CheckExact(ob) || PyBool_Check(ob), in r75051. Without this, one can get different results for 'x in range(10)' and 'x in list(range(10))' if x is an instance of a subclass of int:

Python 3.2a0 (py3k:75050, Sep 24 2009, 20:56:13) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information.

class C(int): ... def eq(self, other): return True ... C(11) in range(10) False C(11) in list(range(10)) True

msg93092 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2009-09-24 20:15

Unassigning myself since I don't intend to do anything myself about xrange.contains in 2.x. (But I'll happily review patches.)

msg103301 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-04-16 08:00

I suggest closing this: it's been implemented (for range) in Python 3.x, and I think it's too late for 2.x now.

msg103336 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2010-04-16 16:20

As an implementation detail, it isn't too late to put this into 2.x if it is something we care about.

msg106884 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-02 14:33

I'd like to implement this for 2.x, if there's any chance of this being accepted. Is there still a chance of getting this into 2.7?

This will be my first patch for the Python core, so please tell me if I'm missing something. Currently I'm thinking of:

  1. starting with the original patch
  2. applying the required changes as mentioned Mark Dickinson, looking at the the accepted 3.x patch for guidance
  3. back-porting the tests
  4. checking for later additional changes in the source history, and back-porting any such relevant changes

msg106890 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-06-02 15:02

A rapidly diminishing chance for 2.7, IMO. I wouldn't want to try to add this after the first release candidate, which is scheduled for June 5 (and I'm not sure I'll have time to review a patch between now and then).

On the other hand, since this is pure optimization it might be acceptable for 2.7.1.

msg106891 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-02 15:17

I'd really like to have this in 2.7. Is there no chance of getting it into 2.7 after rc1 is released?

If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.

msg106893 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-06-02 15:34

Benjamin?

I'd really like to have this in 2.7. Is there no chance of getting it into 2.7 after rc1 is released?

msg106899 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-02 18:02

On Wed, Jun 2, 2010 at 11:17 AM, Tal Einat <report@bugs.python.org> wrote: ..

If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.

I'll review your patch, but why are you so eager to get this into 2.7? You realize that for integer x, x in xrange(a, b) is just another way to spell a <= x < b. What is your use case?

msg106901 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-06-02 18:07

If so, I can have the patch ready by Friday 14:00 GMT, but if nobody will have time to review (and possibly commit) in time for RC1 I guess I'll take my time with this.

I'll review your patch, but why are you so eager to get this into 2.7? You realize that for integer x, x in xrange(a, b) is just another way to spell a <= x < b. What is your use case?

I really think this shouldn't go into 2.7. We certainly have better things to do than potentially break something for an almost useless non-feature.

msg106902 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-06-02 18:07

Rescheduling to 3.2, if applicable (otherwise I suggest closing).

msg106903 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2010-06-02 18:09

I don't really see the point. I would be more inclined towards it if there was a patch already, but patching this doesn't strike me as a key feature.

msg106904 - (view)

Author: Alexander Belopolsky (belopolsky) * (Python committer)

Date: 2010-06-02 18:10

This is already in 3.x. The only reason I can think of to get this in 2.7 is to have fewer performance surprises between 2.x and 3.x.

msg106907 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-06-02 18:18

Le mercredi 02 juin 2010 ร  18:10 +0000, Alexander Belopolsky a รฉcrit :

This is already in 3.x. The only reason I can think of to get this in 2.7 is to have fewer performance surprises between 2.x and 3.x.

Since you are supposed to forward-port from 2.x and 3.x, the surprise will be a good one anyway.

msg106932 - (view)

Author: Tal Einat (taleinat) * (Python committer)

Date: 2010-06-03 07:30

In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.

This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.

I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.

msg106966 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2010-06-03 17:28

2010/6/3 Tal Einat <report@bugs.python.org>:

Tal Einat <taleinat@users.sourceforge.net> added the comment:

In my mind, the reason for this patch is that xrange/range can be thought of as a lazy list of integers. However without this patch, membership checking was done trivially instead of in a "smart/lazy" manner, which is unexpected for users. Finally, conditions such as "num in xrange(3, 1000, 5)" are not trivial to express correctly otherwise, and even more so for negative steps.

This patch is already implemented and accepted for 3.2, I just wish to back-port it to 2.7 which should be fairly straightforward.

I'll just have a patch ready by tomorrow, and hope that someone finds the time to review it and possibly commit it in time for rc1. I realize that this is a minor change at the last minute. I will certainly understand if the people responsible for preparing rc1 are too busy for this.

xrange has behaved like this for such a long time that I don't see what it buys us to commit the patch this late.

History

Date

User

Action

Args

2022-04-11 14:56:25

admin

set

github: 45269

2010-06-03 17:32:05

benjamin.peterson

set

status: open -> closed
resolution: accepted

2010-06-03 17:28:45

benjamin.peterson

set

messages: +

2010-06-03 07:30:14

taleinat

set

messages: +
versions: + Python 2.7, - Python 3.2

2010-06-02 18๐Ÿ”ž40

pitrou

set

messages: +

2010-06-02 18:10:16

belopolsky

set

messages: +

2010-06-02 18:09:09

benjamin.peterson

set

messages: +

2010-06-02 18:07:29

pitrou

set

messages: +
versions: + Python 3.2, - Python 2.7

2010-06-02 18:07:01

pitrou

set

messages: +

2010-06-02 18:03:23

belopolsky

set

assignee: belopolsky

2010-06-02 18:02:59

belopolsky

set

messages: +

2010-06-02 15:34:33

mark.dickinson

set

messages: +

2010-06-02 15:17:31

taleinat

set

messages: +

2010-06-02 15:02:42

mark.dickinson

set

messages: +

2010-06-02 14:33:46

taleinat

set

nosy: + taleinat
messages: +

2010-04-16 16:20:07

rhettinger

set

priority: normal -> low

messages: +

2010-04-16 08:00:37

mark.dickinson

set

versions: - Python 3.2

2010-04-16 08:00:13

mark.dickinson

set

messages: +

2010-04-16 07:54:11

abacabadabacaba

set

nosy: + abacabadabacaba

2009-09-24 20:16:02

mark.dickinson

set

assignee: mark.dickinson -> (no value)

2009-09-24 20:15:44

mark.dickinson

set

messages: +

2009-09-24 20:05:44

mark.dickinson

set

messages: +

2009-09-22 21:48:00

mark.dickinson

set

messages: +

2009-09-22 19:49:49

mark.dickinson

set

messages: +

2009-09-21 22:27:11

lehmannro

set

files: + range.patch

messages: +

2009-09-21 11:42:35

benjamin.peterson

set

nosy: + benjamin.peterson
messages: +

2009-09-21 09:49:33

mark.dickinson

set

assignee: mark.dickinson
messages: +

2009-09-21 09:39:11

mark.dickinson

set

messages: +

2009-09-20 16:37:41

lehmannro

set

files: + range.patch

nosy: + lehmannro
messages: +

type: enhancement -> resource usage

2009-09-03 10:06:55

mark.dickinson

set

messages: +

2009-09-03 10:03:16

mark.dickinson

set

messages: +

2009-09-03 09๐Ÿ”ž18

mark.dickinson

set

messages: +

2009-09-02 21:47:11

hpesoj

set

messages: +

2009-09-02 21:46:11

rhettinger

set

nosy: + rhettinger
messages: +

2009-09-02 21๐Ÿ”ž42

mark.dickinson

set

nosy: + hpesoj

2009-09-02 21:12:12

mark.dickinson

set

stage: needs patch
type: enhancement
versions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0

2009-09-02 21:10:45

mark.dickinson

set

nosy: + mark.dickinson
messages: +

2009-09-02 21:09:58

mark.dickinson

link

issue6826 superseder

2008-02-29 21:33:16

belopolsky

set

messages: +

2008-02-29 19:53:15

stargaming

set

messages: +

2008-02-25 00:57:26

belopolsky

set

nosy: + belopolsky
messages: +

2008-01-07 22:14:34

pitrou

set

nosy: + pitrou
messages: +

2008-01-06 22:29:45

admin

set

keywords: - py3k
versions: Python 2.6, Python 3.0

2007-08-30 00:24:23

gvanrossum

set

versions: + Python 3.0

2007-08-23 22:00:55

georg.brandl

set

keywords: + py3k

2007-08-02 16:14:47

stargaming

create