[Python-Dev] str vs. unicode (original) (raw)

Walter Dörwald walter at livinglogic.de
Wed Jan 19 10:40:46 CET 2005


M.-A. Lemburg wrote:

Walter Dörwald wrote:

str_ and unicode seem to behave differently. A _str overwrite in a str subclass is used when calling str(), a unicode overwrite in a unicode subclass is not used when calling unicode():

[...] If you drop the base class for unicode, this already works.

That's cheating! ;)

My use case is an XML DOM API: unicode() should extract the character data from the DOM. For Text nodes this is the text, for comments and processing instructions this is u"" etc. To reduce memory footprint and to inherit all the unicode methods, it would be good if Text, Comment and ProcessingInstruction could be subclasses of unicode.

This code in object.c:PyObjectUnicode() is responsible for the sub-class version not doing what you'd expect:

if (PyUnicodeCheck(v)) { /* For a Unicode subtype that's not a Unicode object, return a true Unicode object with the same data. */ return PyUnicodeFromUnicode(PyUnicodeASUNICODE(v), PyUnicodeGETSIZE(v)); } So the question is whether conversion of a Unicode sub-type to a true Unicode object should honor unicode or not. The same question can be asked for many other types, e.g. floats (and float), integers (and int), etc. >>> class float2(float): ... def float(self): ... return 3.141 ... >>> float(float2(1.23)) 1.23 >>> class int2(int): ... def int(self): ... return 42 ... >>> int(int2(123)) 123 I think we need general consensus on what the strategy should be: honor these special hooks in conversions to base types or not ?

I'd say, these hooks should be honored, because it gives us more possibilities: If you want the original value, simply don't implement the hook.

Maybe the string case is the real problem ... :-)

At least it seems that the string case is the exception.

So if we fix str this would be a bugfix for 2.4.1. If we fix the rest, this would be a new feature for 2.5.

Bye, Walter Dörwald



More information about the Python-Dev mailing list