[Python-Dev] unicode and str (original) (raw)
M.-A. Lemburg mal at egenix.com
Tue Aug 31 10:23:33 CEST 2004
- Previous message: [Python-Dev] unicode and __str__
- Next message: [Python-Dev] unicode and __str__
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Neil Schemenauer wrote:
With Python 2.4:
>>> u = u'\N{WHITE SMILING FACE}' >>> class A: ... def str(self): ... return u ... >>> class B: ... def unicode(self): ... return u ... >>> u'%s' % A() u'\u263a' >>> u'%s' % B() u'\u263a' With Python 2.3: >>> u'%s' % A() Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128) >>> u'%s' % B() u'<_main_.B instance at 0x401f910c>' The only thing I found in the NEWS file that seemed relevant is this note: u'%s' % obj will now try obj.unicode() first and fallback to obj.str() if no unicode method can be found. I don't think that describes the behavior difference. Allowing str return unicode strings seems like a pretty noteworthy change (assuming that's what actually happened).
str is indeed allowed to return Unicode objects (and has been for quite a while).
The reason we added unicode was to provide a hook for PyObject_Unicode() to try before reverting to str. It is needed because even though returning Unicode objects from str is allowed, in most cases PyObject_Str() gets to talk to it and this API always converts Unicode to a string using the default encoding which can easily fail.
Also, I'm a little unclear on the purpose of the unicode method. If you can return unicode from str then why would I want to provide a unicode method? Perhaps it is meant for objects that can either return a unicode or a string representation depending on what the caller prefers. I have a hard time imagining a use for that.
That's indeed the use case. An object might want to return an approximate string representation in some form if ask for a string, but a true content representation when asked for Unicode. Because of the default encoding problems you might run into with str, we need two slots to provide this kind of functionality.
In Py3k we will probably see str and unicode reunite.
Now back to your original question: the change you see in %-formatting was actually a bug fix. Python 2.3 should have exposed the same behavior as 2.4 does now.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Aug 31 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
- Previous message: [Python-Dev] unicode and __str__
- Next message: [Python-Dev] unicode and __str__
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]