[Python-Dev] urllib.quote and unicode bug resuscitation attempt (original) (raw)
Stefan Rank stefan.rank at ofai.at
Tue Jul 11 15:55:46 CEST 2006
- Previous message: [Python-Dev] Capabilities / Restricted Execution
- Next message: [Python-Dev] urllib.quote and unicode bug resuscitation attempt
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
urllib.quote fails on unicode strings and in an unhelpful way::
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import urllib urllib.quote('a\xf1a') 'a%F1a' urllib.quote(u'ana') 'ana' urllib.quote(u'a\xf1a') Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\lib\urllib.py", line 1117, in quote res = map(safe_map.getitem, s) KeyError: u'\xf1'
There is a (closed) tracker item, dated 2000-10-12, http://sourceforge.net/tracker/?group_id=5470&atid=105470&aid=216716&func=detail and there was a note added to PEP-42 by Guido.
According to a message I found on quixote-users, http://mail.mems-exchange.org/durusmail/quixote-users/5363/ it might have worked prior to 2.4.2. (I guess that this changed because of ascii now being the default encoding?)
BTW, a patch by rhettinger from 8 months or so ago allows urllib.unquote to operate transparently on unicode strings::
urllib.unquote('a%F1a') 'a\xf1a' urllib.unquote(u'a%F1a') u'a\xf1a'
I suggest to add (after 2.5 I assume) one of the following to the beginning of urllib.quote to either fail early and consistently on unicode arguments and improve the error message::
if isinstance(s, unicode):
raise TypeError("quote needs a byte string argument, not unicode,"
" use argument.encode('utf-8')
first.")
or to do The Right Thing (tm), which is utf-8 encoding::
if isinstance(s, unicode): s = s.encode('utf-8')
as suggested in http://www.w3.org/International/O-URL-code.html and rfc3986.
cheers, stefan
- Previous message: [Python-Dev] Capabilities / Restricted Execution
- Next message: [Python-Dev] urllib.quote and unicode bug resuscitation attempt
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]