[Python-Dev] urllib.quote and unicode bug resuscitation attempt (original) (raw)

John J Lee jjl at pobox.com
Tue Jul 11 20:43:22 CEST 2006


On Tue, 11 Jul 2006, Stefan Rank wrote:

urllib.quote fails on unicode strings and in an unhelpful way:: [...] >>> urllib.quote(u'a\xf1a') Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\lib\urllib.py", line 1117, in quote res = map(safemap.getitem, s) KeyError: u'\xf1'

More helpful than silently producing the wrong answer.

[...]

I suggest to add (after 2.5 I assume) one of the following to the beginning of urllib.quote to either fail early and consistently on unicode arguments and improve the error message::

if isinstance(s, unicode): raise TypeError("quote needs a byte string argument, not unicode," " use argument.encode('utf-8') first.")

Won't this break existing code that catches the KeyError, for no big benefit? If nobody is yet sure what the Right Thing is (see below), I think we should not change this yet.

or to do The Right Thing (tm), which is utf-8 encoding::

if isinstance(s, unicode): s = s.encode('utf-8') as suggested in http://www.w3.org/International/O-URL-code.html and rfc3986.

You seem quite confident of that. You may be correct, but have you read all of the following? (not trying to claim superior knowledge by asking that, I just dunno what the right thing is yet: I haven't yet read RFC 2617 or got my head around what the unicode issues are or how they should apply to the Python stdlib)

http://www.ietf.org/rfc/rfc2617.txt

http://www.ietf.org/rfc/rfc2616.txt

http://en.wikipedia.org/wiki/Percent-encoding

http://mail.python.org/pipermail/python-dev/2004-September/048944.html

Also note the recent discussions here about a module named "uriparse" or "urischemes", which fits in to this somewhere. It would be good to make all the following changes in a single Python release (2.6, with luck):

In summary, I agree that your suggested fix (and all of the rest I refer to above) should wait for 2.6, unless somebody (Martin?) who understands all these issues is quite confident your suggested change is OK. Presumably the release managers wouldn't allow it in 2.5 anyway.

John



More information about the Python-Dev mailing list