[Python-Dev] Auto-str and auto-unicode in join (original) (raw)
Tim Peters tim.peters at gmail.com
Sun Aug 29 03:51:48 CEST 2004
- Previous message: [Python-Dev] Auto-str and auto-unicode in join
- Next message: [Python-Dev] Auto-str and auto-unicode in join
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
If we were to do auto-str, it would be better to rewrite str.join() as a 1-pass algorithm, using the kind of "double allocated space as needed" gimmick unicode.join uses. It would be less efficient if auto-promotion to Unicode turns out to be required, but it's hard to measure how little I care about that; it might be faster if auto-str and Unicode promotion aren't needed (as only 1 pass would be needed).
auto-str couldn't really mean string.join(map(str, seq)) either. The problem with the latter is that if a seq element x is a unicode instance, str(x) will convert it into an encoded (8-bit) str, which would not be backward compatible. So the logic would be more (in outline):
class string: def join(self, seq): seq = PySequence_Fast(seq) if seq is NULL: return NULL
if len(seq) == 0:
return ""
elif len(seq) == 1 and type(seq[0]) is str:
return seq[0]
allocate a string object with (say) 100 bytes of space
let p point to the first free byte
for x in seq:
if type(x) is str:
copy x's guts into p, getting more space if needed
elif isinstance(x, unicode):
return unicode,join(self, seq)
else:
x = PyObject_Str(x)
if x is NULL:
return NULL
copy x's guts into p, etc
if not the last element:
copy the separator's guts into p, etc
cut p back to the space actually used
return p's string object
Note a peculiarity: if x is neither str nor unicode, but has a str or repr method that returns a unicode object, PyObject_Str() will convert that into an 8-bit str. That may be surprising. It would be ugly to duplicate most of the logic from PyObject_Unicode() to try to guess whether there's "a natural" Unicode spelling of x. I think I'd rather say "tough luck -- use unicode.join if that's what you want".
- Previous message: [Python-Dev] Auto-str and auto-unicode in join
- Next message: [Python-Dev] Auto-str and auto-unicode in join
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]