[Python-Dev] Do we need length_hint at all? (Was PEP 0424: A method for exposing a length hint) (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Mon Jul 16 15:23:18 CEST 2012

Previous message: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Next message: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Jul 16, 2012 at 7:21 PM, Tim Golden <mail at timgolden.me.uk> wrote:

Speaking of which - I find this bikeshed disgusting. Disgusting? Irritating, perhaps, but why should a PEP -- even one whose purpose is to codify existing practice -- not result in discussions about its subject matter? The final P stands for Proposal, not for Pronouncement.

Indeed - I'd be worried if any PEP sailed through python-dev review without a thorough kicking of the tires. Yes, it can be annoying having to bring people up to speed on issues that they aren't familiar with, but that's generally a sign that there is relevant background information missing from the PEP.

PEP's aren't supposed to be written just for people that are already intimately familiar with a problem - they're supposed to provide enough background that they stand on their own.

In this case, the key points that I think need to be added:

more background on why the length_hint API exists in CPython in the first place: to minimise potentially expensive data copies (due to memory reallocation) when creating a concrete container from an iterator. This includes when creating them from another concrete container via an intermediate iterator. This is why at least the following produce objects that define length_hint in CPython:

reversed itertools.repeat iter(dict()) iter(list()) iter(tuple()) iter(str()) iter(bytes()) iter(bytearray()) iter(set()) iter(frozenset()) dict.values() dict.keys()

As well as any user defined sequence that relies on the default sequence iterator: >>> class MySeq(): ... def getitem(self, idx): ... return idx ... def len(self): ... return 10 ... >>> hasattr(iter(MySeq()), "length_hint") True

clarification on the implications of it only being a "hint": specifically, as it may be an over- or underestimate, you cannot rely on the hint to decide whether or not to iterate over the object when a valid length is returned (as a value of zero may be an underestimate). However, it does allow you to presize your container more appropriately than just starting at zero as usual, thus achieving the aim of reducing the risk of unnecessary memory copies.

That's the basic proposal. Separate from that, there are a few suggestions for enhancement beyond what CPython currently uses (and has demonstrated a clear need for):

adding operator.length_hint(). This makes sense to me, as it makes it much easier to use the API when implementing a container type in Python. It's also a pretty simple addition.
adding a C level type slot. I'm personally -1 on this one in the context of the PEP. I don't think the current PEP (which is really aimed at standardisation for PyPy's benefit) should be weighed down with this CPython specific implementation detail. As a separate proposal, independent of this PEP, from someone that cares specifically about micro-optimising this API for CPython, and (preferably) can produce benchmark numbers to show the additional implementation complexity is worthwhile, then I wouldn't object. I just don't want this orthogonal concern to derail the standardisation effort.
distinguishing between different reasons for saying "don't preallocate any space" (i.e. returning zero). I still haven't heard a convincing rationale for this one - it seems to be based on the notion that the case of skipping the iteration step for a genuinely empty iterable is worth optimising. This idea just doesn't make sense when any legitimate length value that is returned can't be trusted to be completely accurate - you have to iterate to confirm the actual number.
making it possible to fail fast when a known infinite iterator (like itertools.cycle or itertools.count) is passed to a concrete container. I think this is best covered in the PEP by explicitly stating that some types may implement length_hint to always raise TypeError rather than defining a magic return value that means "I'm infinite".
making it possible for iterators like enumerate, map and filter to delegate length_hint to their underlying iterator. This seems potentially worth doing, but requires resolving the problem that Raymond noted with handling the difference in internal behaviour between enumerate("abc") and enumerate(iter("abc")). Again, it would be unfortunate to see the PEP held up over this.
making it possible to define length_hint for generator-iterator objects. While this is a nice idea, again, I don't think it's something that this particular PEP should be burdened with.

My main point is that the current length_hint behaviour has already proven its value in the real world. The PyPy team would like that behaviour codified, so they can be reasonably sure both implementations are doing the same thing. Many of the enhancements I have listed above may be suitable material for future enhancement proposals, but I'm not seeing any requested functionality that would be actively blocked by simply codifying the status quo.

The PEP itself already has this general tone, but I think that it should be even more emphatic that it's about codifying the status quo, not about modifying it with ideas haven't been proven useful through past experience.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Next message: [Python-Dev] Do we need __length_hint__ at all? (Was PEP 0424: A method for exposing a length hint)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list