[Python-Dev] unicode and str (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Tue Aug 31 07:09:40 CEST 2004
- Previous message: [Python-Dev] unicode and __str__
- Next message: [Python-Dev] unicode and __str__
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Neil Schemenauer wrote:
Forgive me if I'm being obtuse, but I'm trying to understand the overall Python unicode design. This works:
>>> sys.getdefaultencoding() 'utf-8' >>> str(A()) '\xe1\x88\xb4'
Ah, ok, so you have changed sys.getdefaultencoding on your installation. Doing so means that some programs will only run on your installation, but not on others (e.g. mine). One shouldn't change the default encoding away from ASCII except to work around buggy applications which would fail because of their unicode-unawareness.
Can you be more specific about what is incorrect with the above class?
In the default installation, it gives a UnicodeEncodeError.
No. In some cases, str() needs to compromise, where unicode() doesn't.
Sorry, I don't understand that statement. Are you saying that we will eventually get rid of str and only have unicode?
No. Eventually, when strings are Unicode objects, the string conversion function will return such a thing. Whether this will be called str, unicode, or string, I don't know. However, this won't happen until Python 3, and it is not clear to me how it will look. We may also need a conversion routine into byte strings.
If only we could. :-) Seriously though, I'm trying to understand the point of unicode. To me it seems to make the transition to unicode string needlessly more complicated.
Why do you say that? You don't have to implement unicode if you don't need it - just like as you don't have to implement len or nonzero: If your class is fine with the standard "non-None is false", implement neither. If your conceptually have a sequence type, implement len for "empty is false". If you have a more different class, implement nonzero for "I decide what false is".
Likewise, if you are happy with the standard '', implement neither str nor unicode. If your class has a canonical byte string representation, implement str. If this byte string representation is not meaningful ASCII, and if a more meaningful string representation using other Unicode characters would be possible, also implement unicode. Never rely on the default encoding being something other than ASCII, though. Eventually, when strings are Unicode objects, you won't be able to change it.
Regards, Martin
- Previous message: [Python-Dev] unicode and __str__
- Next message: [Python-Dev] unicode and __str__
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]