(original) (raw)
Bill Janssen wrote:
Bill Janssen <janssen@parc.com> wrote:
Dan Mahn <dan.mahn@digidescorp.com> wrote:
3) Regarding the following code fragment in urlencode():k = quote_plus(str(k))
if isinstance(v, str):
v = quote_plus(v)
l.append(k + '=' + v)
elif isinstance(v, str):
# is there a reasonable way to convert to ASCII?
# encode generates a string, but "replace" or "ignore"
# lose information and "strict" can raise UnicodeError
v = quote_plus(v.encode("ASCII","replace"))
l.append(k + '=' + v)I don't understand how the "elif" section is invoked, as it uses the
same condition as the "if" section.
This looks like a 2->3 bug; clearly only the second branch should be
used in Py3K. And that "replace" is also a bug; it should signal an
error on encoding failures. It should probably catch UnicodeError and
explain the problem, which is that only Latin-1 values can be passed in
the query string. So the encode() to "ASCII" is also a mistake; it
should be "ISO-8859-1", and the "replace" should be a "strict", I think.
Sorry! In 3.0.1, this whole thing boils down tol.append(quote_plus(k) + '=' + quote_plus(v))
Bill
Thanks. Generally, I would tend to agree. I actually tried something like that, but I found that I had inadvertently been sending numeric values, in which case the str() was saving me. Considering that, I would rather just see something like ...
k = quote\_plus(k) if isinstance(k,(str,bytes)) else quote\_plus(str(k))
if isinstance(v,(str,bytes)):
l.append(k + "=" + quote\_plus(v))
else:
# just keep what's in the else
I think it would be more compatible with existing code calling urlencode(). Additionally, I think it would be nice to allow selection of the quote\_plus() string encoding parameters, but that was one of the other points I listed.
A similar thing could be done when "not doseq", but the handling of "v" would be exactly like "k".
\- Dan