[Python-Dev] urllib.quote and unicode bug resuscitation attempt (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Tue Jul 11 23:16:21 CEST 2006
- Previous message: [Python-Dev] urllib.quote and unicode bug resuscitation attempt
- Next message: [Python-Dev] urllib.quote and unicode bug resuscitation attempt
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Stefan Rank wrote:
I suggest to add (after 2.5 I assume) one of the following to the beginning of urllib.quote to either fail early and consistently on unicode arguments and improve the error message::
if isinstance(s, unicode): raise TypeError("quote needs a byte string argument, not unicode," " use
argument.encode('utf-8')
first.") or to do The Right Thing (tm), which is utf-8 encoding::
The right thing to do is IRIs. This is more complicated than encoding the Unicode string as UTF-8, though: for the host part of the URL, you have to encode it with IDNA (and there are additional complicated rules in place, e.g. when the Unicode string already contains %).
Contributions are welcome, as long as they fix this entire issue "for good" (i.e. in all URL-processing code, and considering all relevant RFCs).
Regards, Martin
- Previous message: [Python-Dev] urllib.quote and unicode bug resuscitation attempt
- Next message: [Python-Dev] urllib.quote and unicode bug resuscitation attempt
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]