[Python-Dev] xrange accepting non-ints (original) (raw)
Neal Norwitz nnorwitz at gmail.com
Thu Aug 24 17:43:38 CEST 2006
- Previous message: [Python-Dev] GeneratorExit is unintuitive and uneccessary
- Next message: [Python-Dev] xrange accepting non-ints
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I've been working on enhancing xrange and there are a bunch of issues to consider. I've got pretty much complete implementations in both C and Python. Currently xrange is 2 objects: range and the iter. These only work on C longs. Here's what I propose:
2.6:
- Add deprecation warning if a float is passed to xrange (currently silently truncates)
- Allow int, long, float, or obj.index
- Implement xrange in python
- Implement iterator over C longs (or Py_ssize_t) in C
- Implement iterator over Python longs in Python (* may lose length_hint)
- Capture the values on construction, so mutating objects wouldn't affect xrange
The idea is to expand xrange's capabilities so that it can replace range in 3k.
I've profiled various combinations. Here are the various results normalized doing xrange(0, 1e6, 1):
Run on all integer (32-bit) values for start, step, end: C xrange and iter: 1 Py xrange w/C iter: 1 Py xrange w/Py iter (gen): 5-8 Py xrange w/Py iter (class): ~30
So having xrange in python is the same speed as if xrange is written in C. The important part is that the iterator needs to be written in C for speed. If we use a generator, something like:
while value < end:
yield value
value += step
The result is ~5 times slower in a release build and 8 times slower in a debug build than with an iterator implemented in C. Using a generator means that there is no length_hint. If we go with a full class that has a length_hint the result was ~32 times slower in a debug build.
The Python impl is about 1/10th the size of the C impl, though is lacking some comments.
Run on Python longs the result is somewhat interesting. The Python based iterator is faster. There's probably a bug in the C version, but given that there is a lot of object allocation, I wouldn't expect the C version to ever be much faster than a similar Python version. Plus the Python version is trivial (same as above) for ints or longs. The C version for longs is quite a bit of code.
Run on all Python longs (still 0..1e6, but sys.maxint..(sys.maxint + 1e6) is the same): C xrange and iter: 1.4 Py xrange w/C iter: not tested Py xrange w/Py iter (gen): 1 Py xrange w/Py iter (class): 4
Caveats:
- The generator version above doesn't support floats. We could easily support floats with a different calculation that would be slightly more expensive, but not have accumulated error.
- By using the generator version, length_hint gets dropped. This means that converting the iterator into a sequence could be slightly more costly as we have to increase the allocation. This would only happen if any of start, step, end weren't an int.
- With a python implementation there is a little bit of bootstraping that is necessary to get the iter implemented in C into the xrange object implemented in Python
- Since xrange is implemented in Python, it can be changed.
- The Python code is much easier to understand than the C code (there is at least one bug in the current C version where -sys.maxint -1 isn't always displayed properly).
Hopefully this is all understandable. If I left anything out, Thomas will remind me.
n
- Previous message: [Python-Dev] GeneratorExit is unintuitive and uneccessary
- Next message: [Python-Dev] xrange accepting non-ints
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]