[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)
James Y Knight foom at fuhm.net
Tue Feb 14 19:36:26 CET 2006
- Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote:
At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
I like it, it makes sense. Unicode strings are simply not allowed as arguments to the byte constructor. Thinking about it, why would it be otherwise? And if you're mixing str-strings and unicode-strings, that means the str-strings you're sometimes giving are actually not byte strings, but character strings anyhow, so you should be encoding those too. bytes(sorU.encode('utf-8')) is a perfectly good spelling. Actually, I think you mean: if isinstance(sorU, str): sorU = sorU.decode('utf-8') b = bytes(sorU.encode('utf-8')) Or maybe: if isinstance(sorU, unicode): sorU = sorU.encode('utf-8') b = bytes(sorU) Which is why I proposed that the boilerplate logic get moved into the bytes constructor. I think this use case is going to be common in today's Python, but in truth I'm not as sure what bytes() will get used for in today's Python. I'm probably overprojecting based on the need to use str objects now, but bytes aren't going to be a replacement for str for a good while anyway.
I most certainly did not mean that. If you are mixing together str
and unicode instances, the str instances must be in the default
encoding (ascii). Otherwise, you are bound for failure anyhow, e.g.
''.join(['\x95', u'1']). Str is used for two things right now: 1) a
byte string. 2) a unicode string restricted to 7bit ASCII. These two
uses are separate and you cannot mix them without causing disaster.
You've created an interface which can take either a utf8 byte-string,
or unicode character string. But that's wrong and can only cause
problems. It should take either an encoded bytestring, or a unicode
character string. Not both. If it takes a unicode character string,
there are two ways of spelling that in current python: a "str" object
with only ASCII in it, or a "unicode" object with arbitrary
characters in it. bytes(s_or_U.encode('utf-8')) works correctly with
both.
James
- Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]