[Python-Dev] More int/long integration issues (original) (raw)

Guido van Rossum guido@python.org
Thu, 13 Mar 2003 21:53:03 -0500


I'm working on a patch that allows both range() and xrange() to work with large (PyLong) values.

I'm not interested for xrange(). As I said, xrange() is a crutch and should not be given features that make it hard to kill.

For range(), sure, upload to SF.

I noticed the need for a least supporting long values when I found some bugs in code that did things like:

a = 1/1e-5 range( a-20, a)

This should be a TypeError. I'm sorry it isn't. range() is only defined for ints, and unfortunately if you pass it a float it truncates rather than failing.

or

a = 1/1e-6 b = 1/1e-5 c = 1/1e-4 range(a, b, c)

Ditto.

(BTW why don't you write this as 1e6, 1e5, 1e4???)

Now, this example is hardcoded, but in graphing software, or other numerical work, the actual values come from the data set. All of a sudden, you could be dealing with very small numbers (say, because you want to examine error values), and you get:

a = 1/1e-21 b = 1/1e-20 c = 1/1e-19 range(a, b, c) And your piece of code now fails. By the comments I've seen, this failure tends to come as a big surprise (people are simply expecting range to be able to work with PyLong values, over short lengths).

But 1/1e-21 is not a long. It's a float. You're flirting with disaster here.

Also, someone who is working with large files (> C long on his machine) claimed to be having problems w/ xrange() failing (although, if he is indexing the xrange object, my patch can't help anyway)

That's a totally different problem. Indeed you can't use xrange() with values > sys.maxint. But it should be easy to recode this without xrange.

I've seen enough people asking in the newsgroups about this behavior (at least four in the past 5 months or so), and I've submitted some application patches to make things work for these cases (ie. by explicitly subtracting out the large common base of each parameter, and adding it back in after the list is generated), so I decided to make a patch to change the range behavior.

Fixing range was relatively easy, and could be done with no performance penalty (the code to handle longs ranges is only invoked after the existing code path fails; the common case is unaltered). Fixing xrange() is trickier, and I'm opting to maintain backwards compatibility as much as possible. In any case, I should have the patch ready to submit within the next week or so (just a few hours more work is needed, for testing and cleanup) Then the argument about whether it should ever be included can begin in earnest. But I have seen enough examples of people being surprised that ranges of long values (where the range length is well within the addressable limit, but the range values must be PyLongs) that I think at least range() should be fixed.

Yes.

And if range() is fixed, then sadly, xrange() should be fixed as well (IMO).

No.

BTW, I'm all for deprecating xrange() with all deliberate speed. Doing so would only make updating range behavior easier.

It can't be deprecated until we have an alternative. That will have to wait until Python 2.4. I fought its addition to the language long and hard, but the arguments from PBP (Practicality Beats Purity) were too strong.

--Guido van Rossum (home page: http://www.python.org/~guido/)