[Python-Dev] urllib.quote and unquote - Unicode issues (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Wed Aug 6 18:55:51 CEST 2008


Martin v. Löwis <martin v.loewis.de> writes:

URLs are just not made for non-ASCII characters.

Perhaps they are not, but every non-English wiki (just to take a simple, generic example) potentially contains non-ASCII URLs. e.g. http://fr.wikipedia.org/wiki/%C3%89l%C3%A9phant http://wiki.python.org/moin/J%C3%BCrgenHermann (notice the utf-8 encoding in both)

Implement IRIs if you want non-ASCII characters; the rules are much clearer for these.

I think most people would expect something which works with the current World Wide Web rather than a rigorous implementation of a specific RFC. Implementing RFCs is fine but it does not magically eliminate all problems, especially when the RFCs themselves are not in sync with real-world usage.

Regards

Antoine.



More information about the Python-Dev mailing list