Objects unicodeobject.c, 2.219, 2.220 (original) (raw)

M.-A. Lemburg mal at egenix.com
Fri Aug 27 16:47:49 CEST 2004

Previous message: [Python-Dev] Re: Re: [Python-checkins] python/dist/src/Objects unicodeobject.c, 2.219, 2.220
Next message: [Python-Dev] J2 proposal final
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Tim Peters wrote:

[M.-A. Lemburg]

Hmm, you've now made PyUnicodeJoin() to work with iterators whereas PyStringJoin() only works for sequences. They have both worked with iterators since the release in which iterators were introduced. Nothing changed now in this respect.

What are the performance implications of this for PyUnicodeJoin() ?

None.

Since the string and Unicode implementations have to be in sync, we'd also need to convert PyStringJoin() to work on iterators. It already does. I replied earlier this week on the same topic -- maybe you didn't see that, or maybe you misunderstand what PySequenceFast does.

Indeed. At the time Fredrik added this API, it was optimized for lists and tuples and had a fallback mechanism for arbitrary sequences. Didn't know that it now also works for iterators. Nice !

Which brings up the second question: What are the performance implications of this for PyStringJoin() ?

None.

The join operation is a widely used method, so both implementations need to be as fast as possible. It may be worthwhile making the PySequenceFast() approach a special case in both routines and using the iterator approach as fallback if no sequence is found. stringjoin uses PySequenceFast already; the Unicode join didn't, and still doesn't. In the cases of exact list or tuple arguments, PySequenceFast would be quicker in Unicode join. But in any cases other than those, PySequenceFast materializes a concrete tuple containing the full materialized iteration, so could be more memory-consuming. That's probably a good tradeoff, though.

Indeed. I'd opt for going the PySequence_Fast() way for Unicode as well.

Note that PyStringJoin() with iterator support will also have to be careful about not trying to iterate twice,

It already is. Indeed, the primary reason it uses PySequenceFast is to guarantee that it never iterates over an iterator argument more than once. The Unicode join doesn't have that potential problem.

so it will have to use a similiar logic to the one applied in PyStringFormat() where the work already done up to the point where it finds a Unicode string is reused when calling PyUnicodeFormat().

def g(): ... for piece in 'a', 'b', u'c', 'd': # force Unicode promotion on 3rd yield ... yield piece ... ' '.join(g()) u'a b c d'

Nice :-)

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Aug 27 2004)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Previous message: [Python-Dev] Re: Re: [Python-checkins] python/dist/src/Objects unicodeobject.c, 2.219, 2.220
Next message: [Python-Dev] J2 proposal final
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list