[Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices) (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Mon Sep 20 23:12:13 CEST 2010
- Previous message: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
- Next message: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough <chrism at plope.com> wrote:
Existing APIs save for "quote" don't really need to deal with charset encodings at all, at least on any level that Python needs to care about. The potential already exists to emit garbage which will turn into mojibake from almost all existing APIs. The only remaining issue seems to be fear of making a design mistake while designing APIs.
IMO, having a separate module for all urllib.parse APIs, each designed for only bytes input is a design mistake greater than any mistake that could be made by allowing for both bytes and str input to existing APIs and returning whatever type was passed. The existence of such a module will make it more difficult to maintain a codebase which straddles Python 2 and Python 3.
Failure to use quote/unquote correctly is a completely different problem from using bytes with an ASCII incompatible encoding, or mixing bytes with different encodings. Yes, if you don't quote your URLs you may end up with mojibake. That's not a justification for creating a new way to accidentally create mojibake.
Separating the APIs means that application programmers will be expected to know whether they are working with data formatted for display to the user (i.e. Unicode text) or transfer over the wire (i.e. ASCII compatible bytes).
Can you give me a concrete use case where the application programmer won't know which format they're working with? Py3k made the conscious decision to stop allowing careless mixing of encoded and unencoded text. This is just taking that philosophy and propagating it further up the API stack (as has already been done with several OS facing APIs for 3.2).
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
- Next message: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]