[Python-Dev] bytes (original) (raw)

[Python-Dev] bytes / unicode

Antoine Pitrou solipsis at pitrou.net
Sun Jun 20 23:47:23 CEST 2010

Previous message: [Python-Dev] email package status in 3.X
Next message: [Python-Dev] bytes / unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 20 Jun 2010 14:40:56 -0400 "P.J. Eby" <pje at telecommunity.com> wrote:

Actually, I would say that it's more that (in the network protocol case) we have bytes, some of which we would like to treat as text, yet do not wish to constantly convert back and forth to full-blown unicode

Well, then why don't you just stick with a bytes object?

While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say "bstr") that was simply a wrapper providing string-like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match).

This really sounds horrible. Python 3 was designed precisely to discourage ad hoc mixing of bytes and unicode.

Actually, if the Python 3 str() constructor could do O(1) conversion for the latin-1 case (i.e., just wrapped the underlying bytes), I would just put, "bstr = lambda x: str(x,'latin-1')" at the top of my programs and have roughly the same effect.

Did you do any measurements that show that latin-1 decoding (hardly a complicated task) introduces a performance regression in Web frameworks in 3.x?

seems so much saner than writing this everywhere:

newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1')

urljoin already returns an str object. Why do you want to decode it again?

Previous message: [Python-Dev] email package status in 3.X
Next message: [Python-Dev] bytes / unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list