Dan Mahn <dan.mahn@digidescorp.com> wrote:">

(original) (raw)



Bill Janssen wrote:
Bill Janssen <janssen@parc.com> wrote:



Dan Mahn <dan.mahn@digidescorp.com> wrote:



3) Regarding the following code fragment in urlencode():

k = quote_plus(str(k))
if isinstance(v, str):
v = quote_plus(v)
l.append(k + '=' + v)
elif isinstance(v, str):
# is there a reasonable way to convert to ASCII?
# encode generates a string, but "replace" or "ignore"
# lose information and "strict" can raise UnicodeError
v = quote_plus(v.encode("ASCII","replace"))
l.append(k + '=' + v)

I don't understand how the "elif" section is invoked, as it uses the
same condition as the "if" section.



This looks like a 2->3 bug; clearly only the second branch should be
used in Py3K. And that "replace" is also a bug; it should signal an
error on encoding failures. It should probably catch UnicodeError and
explain the problem, which is that only Latin-1 values can be passed in
the query string. So the encode() to "ASCII" is also a mistake; it
should be "ISO-8859-1", and the "replace" should be a "strict", I think.



Sorry! In 3.0.1, this whole thing boils down to

l.append(quote_plus(k) + '=' + quote_plus(v))

Bill


Thanks.  Generally, I would tend to agree.  I actually tried something like that, but I found that I had inadvertently been sending numeric values, in which case the str() was saving me.  Considering that, I would rather just see something like ...

k = quote\_plus(k) if isinstance(k,(str,bytes)) else quote\_plus(str(k))
if isinstance(v,(str,bytes)):
    l.append(k + "=" + quote\_plus(v))
else:
   # just keep what's in the else

I think it would be more compatible with existing code calling urlencode().  Additionally, I think it would be nice to allow selection of the quote\_plus() string encoding parameters, but that was one of the other points I listed.

A similar thing could be done when "not doseq", but the handling of "v" would be exactly  like "k".

\- Dan