[Python-Dev] urllib.quote and unquote - Unicode issues (original) (raw)
Bill Janssen janssen at parc.com
Wed Jul 30 19:33:51 CEST 2008
- Previous message: [Python-Dev] urllib.quote and unquote - Unicode issues
- Next message: [Python-Dev] urllib.quote and unquote - Unicode issues
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
It looks like all other APIs in the Py3k version of urllib treat URLs as text.
The URL is text, a string of ASCII characters. We're just talking about urllib.quote() and urllib.unquote(), which are there to support the text-ization of binary values, and the de-text-ization.
I think that would break too much code, without a good way to automatically fix it.
You'd rather break Python? Somehow I don't think so.
Here's the signature I'm proposing:
quote() -- takes string or bytes, and produces string.
If input is a string, looks to optional "encoding" parameter to
determine character set encoding to use to transform it to byte before
quoting it. If "encoding" is not specified, defaults to UTF-8.
unquote() -- takes string, produces bytes or string
If optional "encoding" parameter is specified, decodes bytes with
that encoding and returns string. Otherwise, returns bytes.
Bill
- Previous message: [Python-Dev] urllib.quote and unquote - Unicode issues
- Next message: [Python-Dev] urllib.quote and unquote - Unicode issues
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]