[Python-Dev] (Not) delaying the 3.2 release (original) (raw)

Guido van Rossum guido at python.org
Thu Sep 16 19:21:33 CEST 2010

Previous message: [Python-Dev] (Not) delaying the 3.2 release
Next message: [Python-Dev] (Not) delaying the 3.2 release
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Sep 16, 2010 at 8:42 AM, Toshio Kuratomi <a.badger at gmail.com> wrote:

On Thu, Sep 16, 2010 at 09:52:48AM -0400, Barry Warsaw wrote:

On Sep 16, 2010, at 11:28 PM, Nick Coghlan wrote:

>There are some APIs that should be able to handle bytes or strings, >but the current use of string literals in their implementation means >that bytes don't work. This turns out to be a PITA for some networking >related code which really wants to be working with raw bytes (e.g. >URLs coming off the wire). Note that email has exactly the same problem. A general solution -- even if embodied in well documented best-practices and convention -- would really help make the stdlib work consistently, and I bet third party libraries too. I too await a solution with abated breath :-) I've been working on documenting best practices for APIs and Unicode and for this type of function (take bytes or unicode and output the same type), knowing the encoding is seems like a requirement in most cases: http://packages.python.org/kitchen/designing-unicode-apis.html#take-either-bytes-or-unicode-output-the-same-type I'd love to add another strategy there that shows how you can robustly operate on bytes without knowing the encoding but from writing that, I think that anytime you simplify your API you have to accept limitations on the data you can take in. (For instance, some simplifications can handle anything except ASCII-incompatible encodings).

In all cases I can imagine where such polymorphic functions make sense, the necessary and sufficient assumption should be that the encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all Latin-N variant, and AFAIK also the popular CJK encodings other than UTF-16. This is the same assumption made by Python's byte type when you use "character-based" methods like lower().

--Guido

(*) In my mind ASCII and 7-bit are synonymous, but unfortunately there are droves of naive users who believe that ASCII includes all 256 possible 8-bit bytes using some encoding -- typically the default encoding of their DOS or Windows box. :-(

-- --Guido van Rossum (python.org/~guido)

Previous message: [Python-Dev] (Not) delaying the 3.2 release
Next message: [Python-Dev] (Not) delaying the 3.2 release
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list