msg66213 - (view) |
Author: david reid (zathras) |
Date: 2008-05-04 15:11 |
In urllib2 when using reusing a Request calling add_header doesn't work when an unredirected_header has been added. A good example (and the one that caught me out) is content-type. When making a POST request with no content-type set the current code sets the content-type as an unredirected_header to 'application/x-www-form-urlencoded' (line 1034 of urllib2.py) but in a subsequent request, setting the content type via add_header will see this ignored as unredirected_headers are appended after the headers. A possible solution is to check whether the header being added already exists in the requests undredirected_headers and remove it if it does. |
|
|
msg66263 - (view) |
Author: Martin McNickle (BitTorment) |
Date: 2008-05-05 09:45 |
I can see that there will be a problem in this case. However, it may be that the authors found it more intuitive to always set the Content-Type to application/x-www-form-urlencoded when data is set. Their implementation is inconsistent though: #------------------------------------------------------- request = urllib2.Request('http://www.somesite.com/') request.add_data(data) f = urllib2.urlopen(request) # 'Content-Type: application/x-www-form-urlencoded' is sent in the header #------------------------------------------------------- whereas: #------------------------------------------------------- request = urllib2.Request('http://www.somesite.com/') request.add_header('Content-Type', 'application/xml') request.add_data(data) # 'Content-Type: application/xml' is sent in the header #------------------------------------------------------- We need to decide what is the best way to handle this case. Should the normal headers be given preference, or should the unredirected headers be given preference? --- Martin |
|
|
msg66264 - (view) |
Author: Martin McNickle (BitTorment) |
Date: 2008-05-05 09:50 |
Sorry, the first example should read: #------------------------------------------------------- request = urllib2.Request('http://www.whompbox.com/headertest.php') request.add_data(data) f = urllib2.urlopen(request) request.add_header('Content-Type', 'application/xml') f = urllib2.urlopen(request) # 'Content-Type: application/x-www-form-urlencoded' is sent in both headers. Second header (should?) be 'application/xml' #------------------------------------------------------- |
|
|
msg66309 - (view) |
Author: Martin McNickle (BitTorment) |
Date: 2008-05-06 11:05 |
I decided that the user should have full control over the headers sent. Any call to add_header() or add_redirected_header() will now check if a header with that same name has been set in the Request before (in both the header and the unredirected header dictionaries), and if so overwrite it. The tests run fine with the patch installed, and also included is a patch for the urllib2 documentation. |
|
|
msg69212 - (view) |
Author: Facundo Batista (facundobatista) *  |
Date: 2008-07-03 17:39 |
BitTorment, what would increase the possibility of accepting this is a patch also for the test suite (test_urllib2.py), checking this behaviour, for us to be sure that does not break again in the future. Could you please submit also this? |
|
|
msg71111 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2008-08-14 07:29 |
The submitted patch has problems. It does not correctly solve this issue (which I want to confirm, if there is issue at all after understanding the logic behind unredirected_headers). My explanation of this issue and comments on the patch is here: http://urllib-gsoc.blogspot.com/2008/08/issue2756-urllib2-addheader-fails-with.html Now, coming back to the current issue. We see that addition of unredirected_hdrs takes place in the do_request_ call of AbstractHTTPHandler and it adds the unredirected_hdrs based on certain conditions, like when Content-Type is not there in header add the unredirected header ('Content-Type','application/x-www-form-urlencoded') The value of Content-Type is hardcoded here, but other header values are not hardcoded and got from header request only. Question here is: When the request contains the Content-Type header and has a updated value, why is it not supposed to change the unredirected_header to the updated value? (Same for other request header items). John J Lee, can perhaps help us understand more. If it is supposed to change, then the following snippet (rough) at do_request_ for key, value in request.headers: request.add_unredirected_header(key,request.get_header(key)) should update it, there is no need to change the add_header and add_unredirected_header method as proposed by the patch. On our conclusion, I shall provide the updated patch (if required). Thanks, Senthil |
|
|
msg71112 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2008-08-14 07:35 |
The problem with the patch was: The attached patch modifies the add_header() and add_unredirected_header() method to remove the existing headers of the same name alternately in the headers and unredirected_hdrs. What we observe is unredirected_hdrs item is removed during add_header() calland it is never added back/updated in teh undirected_hdrs. Let us discuss on the points mentioned in my previous post. |
|
|
msg71227 - (view) |
Author: Facundo Batista (facundobatista) *  |
Date: 2008-08-16 16:47 |
What I think is that the AbstractHTTPHandler is grabbing the headers, in the do_open() method, in an incorrect way. Check the get_header() method from Request: def get_header(self, header_name, default=None): return self.headers.get( header_name, self.unredirected_hdrs.get(header_name, default)) What it's doing there is to grab the header from self.header, and if not there use the one in self.unredirected_hdrs, and if not there return the default. So, to emulate this behaviour, in do_open() I just grabbed first the unredirected headers, and then updated it with the normal ones. See my simple patch I attach here, which solves the issue, and passes all the tests also. |
|
|
msg75207 - (view) |
Author: John J Lee (jjlee) |
Date: 2008-10-25 12:40 |
I agree there is a bug here: clearly the methods on the Request class are inconsistent with AbstractHTTPHandler.do_open() . I think Facundo's patch is good, though it needs a test. The general principle when fixing earlier bugs has been that the .add_*header() methods override default headers, though there are some reasonable violations this where the implementation relies on specific headers being sent -- e.g. "Connection: close". |
|
|
msg132246 - (view) |
Author: Facundo Batista (facundobatista) *  |
Date: 2011-03-26 15:48 |
Senthil, I'm assigning this issue to you because you know more about this subject than me. Feel free to unassign if will not be working in it. Thanks! |
|
|
msg186188 - (view) |
Author: Volodymyr Antonevych (volodyaa) * |
Date: 2013-04-07 10:41 |
Test urllib2 file |
|
|
msg186189 - (view) |
Author: Volodymyr Antonevych (volodyaa) * |
Date: 2013-04-07 10:45 |
Test HTTP server |
|
|
msg186191 - (view) |
Author: Volodymyr Antonevych (volodyaa) * |
Date: 2013-04-07 10:54 |
Tested on 2.7. The same bug exists: 127.0.0.1 - - [07/Apr/2013 13:17:39] "POST / HTTP/1.1" 200 - ('content-length', '11') ('accept-encoding', 'identity') ('connection', 'close') ('user-agent', 'Python-urllib/2.7') ('host', '127.0.0.1:8000') ('content-type', 'application/x-www-form-urlencoded') Also, the workaround exists for that issue: instead of using request.add_header have to use request.add_unredirected_header for the second call. BTW, the same issue with 'Content-length' header. If use request.add_data the second time when the content length value leaves the same as for first request. See steps to reproduce in attachments. |
|
|
msg409156 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2021-12-24 20:42 |
Bug needs to be verified on 3.10 or 3.11. |
|
|
msg409266 - (view) |
Author: Carlos Damazio (carlosdamazio) * |
Date: 2021-12-28 19:07 |
This is the server for testing in 3.10. |
|
|
msg409267 - (view) |
Author: Carlos Damazio (carlosdamazio) * |
Date: 2021-12-28 19:07 |
And here's the code to reproduce the bug in 3.10. |
|
|