(original) (raw)

Bill Janssen <janssen@parc.com> wrote:

Dan Mahn <dan.mahn@digidescorp.com> wrote:

3) Regarding the following code fragment in urlencode():
 k = quote_plus(str(k))
 if isinstance(v, str):
 v = quote_plus(v)
 l.append(k + '=' + v)
 elif isinstance(v, str):
 # is there a reasonable way to convert to ASCII?
 # encode generates a string, but "replace" or "ignore"
 # lose information and "strict" can raise UnicodeError
 v = quote_plus(v.encode("ASCII","replace"))
 l.append(k + '=' + v)
I don't understand how the "elif" section is invoked, as it uses the
same condition as the "if" section.

This looks like a 2->3 bug; clearly only the second branch should be
used in Py3K. And that "replace" is also a bug; it should signal an
error on encoding failures. It should probably catch UnicodeError and
explain the problem, which is that only Latin-1 values can be passed in
the query string. So the encode() to "ASCII" is also a mistake; it
should be "ISO-8859-1", and the "replace" should be a "strict", I think.

Sorry! In 3.0.1, this whole thing boils down to
 l.append(quote_plus(k) + '=' + quote_plus(v))
Bill