[Python-Dev] Auto-str and auto-unicode in join (original) (raw)

Tim Peters tim.peters at gmail.com
Sun Aug 29 03:51:48 CEST 2004


If we were to do auto-str, it would be better to rewrite str.join() as a 1-pass algorithm, using the kind of "double allocated space as needed" gimmick unicode.join uses. It would be less efficient if auto-promotion to Unicode turns out to be required, but it's hard to measure how little I care about that; it might be faster if auto-str and Unicode promotion aren't needed (as only 1 pass would be needed).

auto-str couldn't really mean string.join(map(str, seq)) either. The problem with the latter is that if a seq element x is a unicode instance, str(x) will convert it into an encoded (8-bit) str, which would not be backward compatible. So the logic would be more (in outline):

class string: def join(self, seq): seq = PySequence_Fast(seq) if seq is NULL: return NULL

    if len(seq) == 0:
        return ""
    elif len(seq) == 1 and type(seq[0]) is str:
        return seq[0]

    allocate a string object with (say) 100 bytes of space
    let p point to the first free byte

    for x in seq:
        if type(x) is str:
            copy x's guts into p, getting more space if needed
        elif isinstance(x, unicode):
            return unicode,join(self, seq)
        else:
            x = PyObject_Str(x)
            if x is NULL:
                return NULL
            copy x's guts into p, etc

        if not the last element:
            copy the separator's guts into p, etc

    cut p back to the space actually used
    return p's string object

Note a peculiarity: if x is neither str nor unicode, but has a str or repr method that returns a unicode object, PyObject_Str() will convert that into an 8-bit str. That may be surprising. It would be ugly to duplicate most of the logic from PyObject_Unicode() to try to guess whether there's "a natural" Unicode spelling of x. I think I'd rather say "tough luck -- use unicode.join if that's what you want".



More information about the Python-Dev mailing list