[Python-Dev] unicode and str (original) (raw)

Tim Peters tim.peters at gmail.com
Mon Aug 30 22:41:10 CEST 2004


[Neil Schemenauer]

... The only thing I found in the NEWS file that seemed relevant is this note:

u'%s' % obj will now try obj.unicode() first and fallback to obj.str() if no unicode method can be found. I don't think that describes the behavior difference. Allowing str return unicode strings seems like a pretty noteworthy change (assuming that's what actually happened).

It's confusing. A str method or tp_str type slot can return unicode, but what happens after that depends on the caller. PyObject_Str() and PyObject_Repr() try to encode it as an 8-bit string then. But unicode.mod says "oh, cool -- I'm done".

Also, I'm a little unclear on the purpose of the unicode method. If you can return unicode from str then why would I want to provide a unicode method?

Is the purpose clearer if you purge your mind of the belief that str() (as opposed to str!) can return unicode? Here w/ current CVS:

class A: ... def str(self): return u'a' print A() a type(str(A())) <type 'str'>

class A: ... def str(self): return u'\u1234' print A() Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in position 0: ordinal not in range(128)

'%s' % A() Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in position 0: ordinal not in range(128)

u'%s' % A() u'\u1234'

So unicode.mod is what's special here, But not sure that helps .



More information about the Python-Dev mailing list