Issue 32498: urllib.parse.unquote raises incorrect errormessage when string parameter is bytes (original) (raw)

Created on 2018-01-05 19:15 by stein-k, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
urllib_parse_unquote_handle_bytes.patch stein-k,2018-01-18 19:09 Patch adding support for bytes as input to unquote
urllib_parse_unquote_handle_bytes_test.patch stein-k,2018-01-21 12:32 Added test for unquote with bytes input
Pull Requests
URL Status Linked Edit
PR 7768 merged stein-k,2018-06-17 19:16
PR 22746 merged iritkatriel,2020-10-18 14:48
Messages (19)
msg309517 - (view) Author: stein-k (stein-k) * Date: 2018-01-05 19:15
urllib.parse.unquote(b'abc%20def') ... TypeError: a bytes-like object is required, not 'str'
msg309524 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-05 20:52
If you read the traceback the message is "correct" for some definition of correct: the right hand side controls the type of the expression, so it is objecting to trying to look for the string '%' in a bytes object. There are probably ways this could be improved, but I'm not sure it is worth it, since this is just a general behavior of the 'in' operator.
msg309566 - (view) Author: stein-k (stein-k) * Date: 2018-01-06 18:39
The message is incorrect for the input to the function, and confusing because the caller did supply a bytes-like object. The exception is from an implementation detail, and if the implementation changes the exception could as well. I suggest the implementation is changed to follow the example of urllib.parse.unquote_to_bytes, which encodes its string parameter to bytes if it receives a str object. Alternatively, a check for required type should be implemented and raise a TypeError with the correct error-message.
msg309625 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-07 15:59
We generally don't do type checking (see discussions of "duck typing"). We generally do just let the implementation detail bubble up. I don't think the encoding suggestion works, since we can't know what encoding the byte string is in, or even if it is a valid one. I wonder if unquote should be polymorphic: accept both bytes and strings and produce the same type of output as its input. I think there are other urllib methods that are, but I haven't checked.
msg309633 - (view) Author: stein-k (stein-k) * Date: 2018-01-07 19:18
The unquote function does take an encoding argument, but I am more worried about unquote_to_to_bytes which just assumes utf-8. (But this might be an implementation detail of its intended usage.) I must agree that a polymorphic implementation would be better. Do you think this tracker is the correct place for such a change, or would an enhancement proposal be more suited?
msg309641 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-01-07 22:37
I think Nick was the last one who touched the byte/string issues in urllib, so I've nosied him. We'll see what he thinks (but this tracker accepts enhancement proposals as long as they are smallish ones).
msg309657 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2018-01-08 02:39
As David noted, we updated all the URL parsing functions to be polymorphic back in 3.2: https://docs.python.org/3/library/urllib.parse.html#parsing-ascii-encoded-bytes We left the quoting functions alone, because they already had their own way of dealing with the bytes-vs-str distinction (quote_from_bytes, unquote_to_bytes, etc) that meant the polymorphic approach we adopted for the parsing functions didn't make sense. That said, I think it would be reasonable to enhance unquote() to accept a bytes object, processing it as follows: unquote_to_bytes(string).decode(encoding, errors)
msg310252 - (view) Author: stein-k (stein-k) * Date: 2018-01-18 19:09
Added a patch containing the fix, is this the proper way or should I create a pull request?
msg310370 - (view) Author: stein-k (stein-k) * Date: 2018-01-21 12:32
Patch for tests. Added test for calling unquote with bytes input and removed Exception raised in original test.
msg319814 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-06-17 13:28
@stein-k Thanks for the patch and tests. The patches apply cleanly on master branch. The project accepts pull requests and it will be helpful if you can make a PR for this on GitHub to get this merged. Since this is a bug fix I think it would require a News entry. Please find the below links for reference PR workflow : https://devguide.python.org/pullrequest/ News entry reference : https://devguide.python.org/committing/?highlight=blurb#what-s-new-and-news-entries Thanks
msg321286 - (view) Author: stein-k (stein-k) * Date: 2018-07-08 19:41
I have made the News Entry and created the Pull request as described here: https://devguide.python.org/pullrequest/. Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation?
msg354468 - (view) Author: Joannah Nanjekye (nanjekyejoannah) * (Python committer) Date: 2019-10-11 17:03
> Should I update the Status and Resolution of the issue here, or wait for some kind of confirmation? The status is changed after the patch is merged. The person that merges will usually change the status of the issue or If he/she forgets, anyone with developer role can update the status too.
msg354624 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-14 10:27
So urllib.parse.unquote() will accept bytes in addition to str starting with 3.9. The unclear exception remains in prior versions. Would someone like to add a better exception for 3.7 and 3.8? (I'm marking this as "newcomer friendly", referring only to improving the exception for 3.7 and 3.8.)
msg354625 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2019-10-14 10:36
New changeset aad2ee01561f260c69af1951c0d6fcaf75c4d41b by Tal Einat (Stein Karlsen) in branch 'master': bpo-32498: urllib.parse.unquote also accepts bytes (GH-7768) https://github.com/python/cpython/commit/aad2ee01561f260c69af1951c0d6fcaf75c4d41b
msg378784 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-10-16 23:07
Can this be closed? (It's intended as a 3.9-only change).
msg378858 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-18 10:26
Thanks for the reminder Irit! No, this should not be closed yet, as there is still the possibility of improving the exception in version 3.8.
msg378870 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-10-18 14:49
Ah yes, I missed that. I've pushed a PR for 3.8 that does this: >>> urllib.parse.unquote(b'abc%20def') Traceback (most recent call last): File "", line 1, in File "C:\Users\User\src\cpython\lib\urllib\parse.py", line 635, in unquote raise TypeError('Expected str, got bytes') TypeError: Expected str, got bytes
msg378898 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-18 21:06
New changeset 1a3f7c042a32fb813835243bd7f96e47c665bfdc by Irit Katriel in branch '3.8': [3.8] bpo-32498: Improve exception message on passing bytes to urllib.parse.unquote (GH-22746) https://github.com/python/cpython/commit/1a3f7c042a32fb813835243bd7f96e47c665bfdc
msg378899 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2020-10-18 21:07
Thanks for the report and the PR, Stein! And thanks for the PR for 3.8, Irit!
History
Date User Action Args
2022-04-11 14:58:56 admin set github: 76679
2020-10-18 21:07:57 taleinat set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2020-10-18 21:06:41 taleinat set messages: +
2020-10-18 14:49:41 iritkatriel set messages: +
2020-10-18 14:48:14 iritkatriel set pull_requests: + <pull%5Frequest21709>
2020-10-18 10:26:11 taleinat set messages: + versions: + Python 3.8
2020-10-16 23:07:56 iritkatriel set nosy: + iritkatrielmessages: +
2019-10-14 10:36:33 taleinat set messages: +
2019-10-14 10:27:18 taleinat set keywords: + newcomer friendlynosy: + taleinatmessages: +
2019-10-14 10:13:57 taleinat set versions: + Python 3.9, - Python 3.4, Python 3.5, Python 3.6, Python 3.7
2019-10-11 17:03:13 nanjekyejoannah set nosy: + nanjekyejoannahmessages: +
2018-07-08 19:41:07 stein-k set messages: +
2018-06-17 19:16:00 stein-k set stage: patch reviewpull_requests: + <pull%5Frequest7376>
2018-06-17 13:28:12 xtreak set nosy: + xtreakmessages: +
2018-06-17 13:03:59 r.david.murray link issue33846 superseder
2018-01-21 12:32:20 stein-k set files: + urllib_parse_unquote_handle_bytes_test.patchmessages: +
2018-01-18 19:09:20 stein-k set files: + urllib_parse_unquote_handle_bytes.patchkeywords: + patchmessages: +
2018-01-08 02:39:19 ncoghlan set messages: +
2018-01-07 22:37:02 r.david.murray set nosy: + ncoghlanmessages: +
2018-01-07 19🔞36 stein-k set messages: +
2018-01-07 15:59:32 r.david.murray set messages: +
2018-01-06 18:39:07 stein-k set messages: +
2018-01-05 20:52:50 r.david.murray set nosy: + r.david.murraymessages: +
2018-01-05 19:15:10 stein-k create