[Python-Dev] email package status in 3.X (original) (raw)

Jesse Noller jnoller at gmail.com
Sun Jun 20 22:10:00 CEST 2010


On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby <pje at telecommunity.com> wrote:

At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote:

The problem comes exactly where you find it: when porting existing code that uses aforementioned ways to alleviate the pain, you find that the hacks no longer work and a properly layered design is needed that clearly distinguishes between which variables contain bytes and which text. Actually, I would say that it's more that (in the network protocol case) we have bytes, some of which we would like to treat as text, yet do not wish to constantly convert back and forth to full-blown unicode -- especially since the protocols themselves designate ASCII or latin-1 at the transport layer (sometimes with odder encodings above, but these already have to be explicitly dealt with by existing code). While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say "bstr") that was simply a wrapper providing string-like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match). Then, I could wrap bytes with it to pass them to string operations, and then feed them back into everything else.  The bstr type ideally would be directly compatible with bytes I/O, or at least have a .bytes attribute that would be. It seems like that would reduce WSGI porting issues quite a bit, since it would mostly consist of throwing extra bstr() calls in where things are breaking, and maybe grabbing the .bytes attribute for I/O. This approach would still be explicit as to what types you're working with, but would not require O(n) conversions at every interaction boundary.  It would be limited, of course, to single-byte encodings with all characters (0-255) valid. OTOH, maybe there should just be a bytestrings module with bytestrings.ascii and bytestrings.latin1, and between the two that should cover the network protocol needs quite well. Actually, if the Python 3 str() constructor could do O(1) conversion for the latin-1 case (i.e., just wrapped the underlying bytes), I would just put, "bstr = lambda x: str(x,'latin-1')" at the top of my programs and have roughly the same effect. This idea is still a bit half-baked, but a more baked version might be just the ticket for porting stuff that used str to work with bytes in 2.x, if only because writing, e.g.: newurl = bstr(urljoin(bstr(base), 'subdir')) seems so much saner than writing this everywhere: newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') It is perhaps a bit late to propose this idea, since ideally we would also want to use it in 2.x to aid porting.  But I'm curious if any other people here experiencing byte/unicode woes in relation to network protocols would find this a solution to their chief frustration.  (i.e., that the stdlib often insists now on strings, where effectively bytes were usable before, and thus one must do conversions both coming and going.)

I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of "things like this".

jesse



More information about the Python-Dev mailing list