[Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices) (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Tue Sep 21 15:38:08 CEST 2010
- Previous message: [Python-Dev] r84931 - in python/branches/py3k: Include/symtable.h Misc/NEWS Python/ast.c Python/compile.c Python/future.c Python/symtable.c
- Next message: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Sep 21, 2010 at 3:03 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
On the other hand, it is dangerous to provide a polymorphic API which does that more extensive parsing, because a less than paranoid programmer will have very likely allowed the parsed components to escape from the context where their encodings can be reliably determined. Remember, *it is unlikely that they will ever be punished for their own lack of caution.* The person who is doomed is somebody who tries to take that code and reuse it in a different context.
Yeah, that's the original reasoning that had me leaning towards the parallel API approach. If I seem to be changing my mind a lot in this thread it's because I'm genuinely torn between the desire to make it easier to port existing 2.x code to 3.x by making the current API polymorphic and the fear that doing so will reintroduce some of the exact same bytes/text confusion that the bytes/str split is trying to get rid of.
There's no real way for 2to3 to help with the porting issue either, since it has no way to determine the original intent of the 2.x code.
I think avoiding the quote/unquote precedent and applying the rule "bytes in -> bytes out" will help with avoiding the worst of any potential encoding confusion problems though. At some point the programmer is going to have to invoke decode() if they want a string to pass to display functions and the like (or vice versa with encode()) so there are still limits to how far any poorly handled code will get before blowing up. (Basically, while the issue of programmers assuming 'latin-1' or 'utf-8' or similar ASCII friendly encodings when they shouldn't is real, I don't believe a polymorphic API here will make things any worse than what would happen with a parallel API)
And if this turns out to be a disaster in practice: a) on my head be it; and b) we still have the option of the DeprecationWarning dance for bytes inputs to the existing functions and moving to a parallel API
Still-trying-to-figure-out-what-moment-of-insanity-prompted-me-to-volunteer-to-tackle-this'ly, Nick.
- Previous message: [Python-Dev] r84931 - in python/branches/py3k: Include/symtable.h Misc/NEWS Python/ast.c Python/compile.c Python/future.c Python/symtable.c
- Next message: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]