Issue 13359: urllib2 doesn't escape spaces in http requests (original) (raw)

process

Status: open Resolution: duplicate
Dependencies: Superseder: urlopen URL with unescaped space View:14826
Assigned To: Nosy List: Ramchandra Apte, davide.rizzo, ezio.melotti, karlcow, kiilerix, krisys, maker, martin.panter, orsenthil, sandro.tosi, senko
Priority: normal Keywords: patch

Created on 2011-11-06 20:13 by davide.rizzo, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
issue13359.patch krisys,2011-11-09 11:26 percent encoding of urls to fix the issue reported.
issue13359.patch maker,2012-01-12 15:04 review
issue13359_py2.patch maker,2012-01-12 15:30 review
urllib-request-space-encode.diff senko,2013-07-06 10:08 review
Messages (10)
msg147180 - (view) Author: Davide Rizzo (davide.rizzo) * Date: 2011-11-06 20:13
urllib2.urlopen('http://foo/url and spaces') will send a HTTP request line like this to the server: GET /url and spaces HTTP/1.1 which the server obviously does not understand. This contrasts with urllib's behaviour which replaces the spaces (' ') in the url with '%20'. Related: #918368 #1153027
msg147349 - (view) Author: Krishna Bharadwaj (krisys) Date: 2011-11-09 11:26
I have used the quote method to percent encode the url for spaces and similar characters. This is my first patch. Please let me know if there is anything wrong. I will correct and re-submit it. I ran the test_urllib2.py which gave an OK for 34 tests. Changes are made in two instances: 1. in the open method. 2. in the __init__ of Request class to ensure that the same issue is addressed at the time of creating Request objects.
msg149441 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2011-12-14 12:08
Seems good.
msg151126 - (view) Author: Michele OrrĂ¹ (maker) * Date: 2012-01-12 15:04
Patch attached for python3, with unit tests.
msg151127 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2012-01-12 15:10
FWIW, I don't think it is a good idea to escape automatically. It will change the behaviour in a non-backward compatible way for existing applications that pass encoded urls to this function. I think the existing behaviour is better. The documentation and the failure mode for passing URLs with spaces could however be improved.
msg151129 - (view) Author: Michele OrrĂ¹ (maker) * Date: 2012-01-12 15:30
Here the patch for python2. kiilerix, RFC 1738 explicitly says that the space character shall not be used.
msg151131 - (view) Author: Mads Kiilerich (kiilerix) * Date: 2012-01-12 15:35
Yes, the url sent by urllib2 must not contain spaces. In my opinion the only way to handle that correctly is to not pass urls with spaces to urlopen. Escaping the urls is not a good solution - even if the API was to be designed from scratch. It would be better to raise an exception if it is passed an invalid url. Note for example that '/' and the %-encoding of '/' are different, and it must thus be possible to pass an url containing both to urlopen. That is not possible if it automically escapes.
msg183576 - (view) Author: karl (karlcow) * Date: 2013-03-06 03:20
The issue with the current patch is that it is escaping more than only the spaces, with possibly indirect border effect. Anne van Kesteren is in the process of creating a parsing/writing specification for URL. Not finished but putting it here for future reference. http://url.spec.whatwg.org/
msg192400 - (view) Author: Senko Rasic (senko) * Date: 2013-07-06 10:08
I vote for the parse method converting the spaces (and only the spaces) explicitly, for the following reasons: * the spaces must be encoded for the server to accept them * no user-encoded url will ever have spaces in them * space quoting is idempotent: quote(quote(' ')) == quote(' ') * if the user did get an exception from Request in case of invalid url containing the spaces, the only thing he or she can do is to quote the url string Here's a patch implementing this. The change allows for any whitespace character in the selector part of the url (and in particular, '\n'), not only ' '.
msg295066 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-06-03 05:54
I think this could be merged with Issue 14826. Maybe it is sensible to handle all control characters the same way.
History
Date User Action Args
2022-04-11 14:57:23 admin set github: 57568
2017-06-03 05:54:20 martin.panter set nosy: + martin.pantermessages: + resolution: duplicatesuperseder: urlopen URL with unescaped space
2013-07-06 10:08:12 senko set files: + urllib-request-space-encode.diffnosy: + senkomessages: +
2013-03-06 03:20:50 karlcow set nosy: + karlcowmessages: +
2012-01-12 15:35:57 kiilerix set messages: +
2012-01-12 15:30:04 maker set files: + issue13359_py2.patchmessages: +
2012-01-12 15:10:58 kiilerix set nosy: + kiilerixmessages: +
2012-01-12 15:04:33 maker set files: + issue13359.patchnosy: + makermessages: +
2011-12-14 12:08:23 Ramchandra Apte set nosy: + Ramchandra Aptemessages: +
2011-12-14 10:55:00 sandro.tosi set nosy: + sandro.tosi
2011-11-09 11:26:12 krisys set files: + issue13359.patchnosy: + krisysmessages: + keywords: + patch
2011-11-06 20:14:48 ezio.melotti set nosy: + ezio.melottistage: test needed
2011-11-06 20:13:46 davide.rizzo create