[Python-Dev] urllib unicode handling (original) (raw)
Tom Pinckney thomaspinckney3 at gmail.com
Wed May 7 22:04:29 CEST 2008
- Previous message: [Python-Dev] urllib unicode handling
- Next message: [Python-Dev] #1858, looking for a reviewer
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I was assuming urllib.quote/unquote would only be called on text
intended to be used in non-hostname portions of the URIs. I'm not sure
if this is the actual intent of urllib.quote and perhaps the
documentation should be updated to specify what precisely it does and
then peopel can decide what parts of URIs it is appropriate to quote/
unquote. I don't believe quote/unquote does anything sensical with
hostnames today that contain non-printable ascii, so this is no loss
of existing functionality.
Re your suggestion that IRIs should be a separate module: I guess my
thought is that urllib out of the box should just work with the way
websites on the web today actually work. Thus, we should make urllib
do the utf-8 encode / decode rather than make users switch to a
different module for certain URLs and another library for other URLs.
Re the specific issue of how urllib.unquote should work: Perhaps there
could be an optional second argument that specified a content encoding
to use when decoding escaped characters? I would propose that this
parameter have a default value of utf-8 since that is what most
websites seem to do, but if the author knew that the website they were
using encoded URLs in iso-8559 then they could unquote using that
scheme.
On May 7, 2008, at 3:10 PM, Martin v. Löwis wrote:
If this is indeed the case, it sounds perfectly legal (according to the RFC) and perfectly practical (as required by numerous popular websites) to have urllib.quote and urllib.quoteplus do an automatic UTF-8 encoding of unicode strings before percent encoding them. It's probably legal, but I don't understand why you think it's practical. The DNS lookup then will certainly fail, no? Regards, Martin
- Previous message: [Python-Dev] urllib unicode handling
- Next message: [Python-Dev] #1858, looking for a reviewer
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]