[Python-Dev] Re: Re: [Python-checkins] python/dist/src/Objects unicodeobject.c, 2.219, 2.220 (original) (raw)

Tim Peters tim.peters at gmail.com
Fri Aug 27 16:30:27 CEST 2004


[M.-A. Lemburg]

Hmm, you've now made PyUnicodeJoin() to work with iterators whereas PyStringJoin() only works for sequences.

They have both worked with iterators since the release in which iterators were introduced. Nothing changed now in this respect.

What are the performance implications of this for PyUnicodeJoin() ?

None.

Since the string and Unicode implementations have to be in sync, we'd also need to convert PyStringJoin() to work on iterators.

It already does. I replied earlier this week on the same topic -- maybe you didn't see that, or maybe you misunderstand what PySequence_Fast does.

Which brings up the second question: What are the performance implications of this for PyStringJoin() ?

None.

The join operation is a widely used method, so both implementations need to be as fast as possible. It may be worthwhile making the PySequenceFast() approach a special case in both routines and using the iterator approach as fallback if no sequence is found.

string_join uses PySequence_Fast already; the Unicode join didn't, and still doesn't. In the cases of exact list or tuple arguments, PySequence_Fast would be quicker in Unicode join. But in any cases other than those, PySequence_Fast materializes a concrete tuple containing the full materialized iteration, so could be more memory-consuming. That's probably a good tradeoff, though.

Note that PyStringJoin() with iterator support will also have to be careful about not trying to iterate twice,

It already is. Indeed, the primary reason it uses PySequence_Fast is to guarantee that it never iterates over an iterator argument more than once. The Unicode join doesn't have that potential problem.

so it will have to use a similiar logic to the one applied in PyStringFormat() where the work already done up to the point where it finds a Unicode string is reused when calling PyUnicodeFormat().

def g(): ... for piece in 'a', 'b', u'c', 'd': # force Unicode promotion on 3rd yield ... yield piece ... ' '.join(g()) u'a b c d'



More information about the Python-Dev mailing list