[Python-Dev] urllib.quote and unquote - Unicode issues (original) (raw)

Bill Janssen janssen at parc.com
Thu Jul 31 09:39:29 CEST 2008


Guido says:

> Actually, we'd need to look at the various other APIs in Py3k before we can > decide whether these should be considered taking or returning bytes or text. > It looks like all other APIs in the Py3k version of urllib treat URLs as > text.

Yes, as I said in the bug tracker, I've groveled over the entire stdlib to see how my patch affects the behaviour of dependent code. Aside from a few minor bits which assumed octets (and did their own encoding/decoding) (which I fixed), all the code assumes strings and is very happy to go on assuming this, as long as the URIs are encoded with UTF-8, which they almost certainly are.

I'm not sure that's sufficient review, though I agree it's necessary. The major consumers of quote/unquote are not in the Python standard library.

(quote will accept either type, while unquote will output a str, there will be a new function unquotetobytes which outputs a bytes - is everyone happy with that?)

No, so don't ask.

Bill



More information about the Python-Dev mailing list