[Python-Dev] unifying str and unicode (original) (raw)
James Y Knight foom at fuhm.net
Tue Oct 4 05:44:13 CEST 2005
- Previous message: [Python-Dev] unifying str and unicode
- Next message: [Python-Dev] unifying str and unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
Antoine Pitrou wrote:
If I have an unicode string containing legal characters greater than 0x7F, and I pass it to a function which converts it to str, the conversion fails.
so? if it does that, it's not unicode safe. [...] what's that has to do with my argument (which is that you can safely mix ascii strings and unicode strings, because that's how things were designed). If that's how things were designed, then Python's entire standard brary (not to mention third-party libraries) is not "unicode safe" - to quote your own words - since many functions may return 8-bit strings containing non-ascii characters. huh? first you talk about functions that convert unicode strings to 8-bit strings, now you talk about functions that return raw 8-bit strings? and all this in response to a post that argues that it's in fact a good idea to use plain strings to hold textual data that happens to contain ASCII only, because 1) it works, by design, and 2) it's almost always more efficient. if you don't know what your own argument is, you cannot expect anyone to understand it.
Your point would be much easier to stomach if the "str" type could
only hold 7-bit ASCII. Perhaps that can be done when Python gets an
actual bytes type in 3.0. There indeed are a multitude of uses for
the efficient storage/processing of ASCII-only data. However,
currently, there are problems because it's so easy to screw yourself
without noticing when mixing unicode and str objects. If, on the
other hand, you have a 7bit ascii string type, and a 16/32-bit
unicode string type, both can be used interchangeably and there is no
possibility for any en/de-coding issues. And
asciiOnlyStringType.encode('utf-8') can become ultra efficient, as
a bonus. :)
Seems win-win to me.
James
- Previous message: [Python-Dev] unifying str and unicode
- Next message: [Python-Dev] unifying str and unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]