[Python-Dev] urllib unicode handling (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Wed May 7 21:06:00 CEST 2008
- Previous message: [Python-Dev] urllib unicode handling
- Next message: [Python-Dev] urllib unicode handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Maybe I didn't understand the RFC quite right, but it seemed like how to handle hostnames was left as a choice between IDNA encoding the hostname or replacing the non-ascii characters with dashes? I guess in practice IDNA is the right decision.
I haven't fully understood it, either, but I think that's the right conclusion. People want to fetch the resource, then, and encoding the host name in UTF-8 won't do much good.
Seems like the other somewhat under-specified part of all of this is how urllib.unquote() should work. If after percent decoding it sees non-ascii octets, should it try to decode them as utf-8 and if that fails then leave them as is?
That's why I think that using IRIs should be a separate feature, perhaps a separate module entirely.
Regards, Martin
- Previous message: [Python-Dev] urllib unicode handling
- Next message: [Python-Dev] urllib unicode handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]