[Python-Dev] urllib.quote and unquote - Unicode issues (original) (raw)

Matt Giuca matt.giuca at gmail.com
Sat Jul 12 19:27:16 CEST 2008


Hi all,

My first post to the list. In fact, first time Python hacker, long-time Python user though. (Melbourne, Australia).

Some of you may have seen for the past week or so my bug report on Roundup, http://bugs.python.org/issue3300

I've spent a heap of effort on this patch now so I'd really like to get some more opinions and have this patch considered for Python 3.0.

Basically, urllib.quote and unquote seem not to have been updated since Python 2.5, and because of this they implicitly perform Latin-1 encoding and decoding (with respect to percent-encoded characters). I think they should default to UTF-8 for a number of reasons, including that's what other software such as web browsers use.

I've submitted a patch which fixes quote and unquote to use UTF-8 by default. I also added extra arguments allowing the caller to choose the encoding (after discussion, there was some consensus that this would be beneficial). I have now completed updating the documentation, writing extensive test cases, and testing the rest of the standard library for code breakage - with the result being there wasn't really any, everything seems to just work nicely with UTF-8. You can read the sordid details of my investigation in the tracker.

Firstly, it'd be nice to hear if people think this is desirable behaviour. Secondly, if it's feasible to get this patch in Python 3.0. (I think if it were delayed to Python 3.1, the code breakage wouldn't justify it). And thirdly, if the first two are positive, if anyone would like to review this patch and check it in.

I have extensively tested it, and am now pretty confident that it won't cause any grief if it's checked in.

Thanks very much, Matt Giuca -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20080713/d6f74f48/attachment.htm>



More information about the Python-Dev mailing list