[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)

Phillip J. Eby pje at telecommunity.com
Mon Oct 24 04:23:40 CEST 2005


At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote:

Folks, please focus on what Python 3000 should do.

I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But I could use some validation or feedback on this idea from actual practitioners.

+1. Chandler has been going through quite an upheaval to get its unicode handling together. Having a bytes type would be great, as long as there was support for files and sockets to produce bytes instead of strings (unless an encoding was specified).

I'm tempted to say it would be even better if there was a command line option that could be used to force all binary opens to result in bytes, and require all text opens to specify an encoding. The Chandler i18n project lead would jump for joy if we had a way to keep "legacy" strings out of the system, apart from ASCII string constants found in code.

It would then be okay not to drop support for the implicit conversions; if you can't get strings on input, then conversion's not really an issue.

Anyway, I think all of the things I'd like to see can be done without breakage in 2.5. For Chandler at least, we'd be willing to go with a command-line option that's more strict, in order to be able to ensure that plugin developers can't accidentally put 8-bit strings in somewhere, just by opening a file.



More information about the Python-Dev mailing list