msg169293 - (view) |
Author: karl (karlcow) * |
Date: 2012-08-28 20:31 |
The current parsing of HTTP status line seems strange with regards to its definition in HTTP. http://hg.python.org/cpython/file/3.2/Lib/http/client.py#l307 Currently the code is version, status, reason = line.split(None, 2) >>> status1 = "HTTP/1.1 200 OK" >>> status2 = "HTTP/1.1 200 " >>> status3 = "HTTP/1.1 200" >>> status1.split(None, 2) ['HTTP/1.1', '200', 'OK'] >>> status2.split(None, 2) ['HTTP/1.1', '200'] >>> status3.split(None, 2) ['HTTP/1.1', '200'] According to the production rules of HTTP/1.1 bis only status1 and status2 are valid. status-line = HTTP-version SP status-code SP reason-phrase CRLF — http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-20#section-3.1.2 with reason-phrase = *( HTAB / SP / VCHAR / obs-text ) aka 0 or more characters. I'm also not sure what are the expected ValueError with additional parsing rules which seems even more bogus. First modification should be >>> status1.split(' ', 2) ['HTTP/1.1', '200', 'OK'] >>> status2.split(' ', 2) ['HTTP/1.1', '200', ''] Which would be correct for the first two, with an empty reason-phrase The third one is still no good. >>> status3.split(' ', 2) ['HTTP/1.1', '200'] An additional check could be done with len(status.split(' ', 2)) == 3 Will return False in the third case. Do you want me to create a patch and a test for it? |
|
|
msg169294 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-08-28 20:39 |
Could you explain the error you are seeing in more detail first? You are talk about parsing and fixes here, but I'm not clear on what the actual bug is you are reporting. |
|
|
msg169297 - (view) |
Author: karl (karlcow) * |
Date: 2012-08-28 20:59 |
ok. status lines 1 and 2 are valid. the third one is invalid and should trigger a raise BadStatusLine(line) The code at line 318 is bogus as it will parse happily the third line without raising an exception. http://hg.python.org/cpython/file/3.2/Lib/http/client.py#l318 |
|
|
msg169298 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-08-28 21:01 |
Why should it raise an error? The postel principle suggests that we should treat it as equivalent to the second line. |
|
|
msg169299 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-08-28 21:03 |
I would point out that the goal of the http module is to provide interoperability with real-life servers, rather than be a strict implementation of RFCs. So, starting to raise errors for "HTTP/1.1 200" while "HTTP/1.1 200 " remains ok might not be the best idea. |
|
|
msg169305 - (view) |
Author: karl (karlcow) * |
Date: 2012-08-28 21:28 |
Fair enough, it could be a warning when * more than one space in between http version and status code * if there is a missing space after the status code I'm not advocating for being strict only. I'm advocating for giving the tools to developer to assess that things are right and choose or not to ignore and having to avoid to patch the libraries or rewrite modules when you create code which needs to be strict for specifically validating responses and requests. :) ps: I haven't checked yet if the server counter part of httplib was strict in the production rule. |
|
|
msg169453 - (view) |
Author: karl (karlcow) * |
Date: 2012-08-30 13:59 |
So what do we do with it? Do I created a patch or do we close that bug? :) No hard feelings about it. |
|
|
msg169454 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-08-30 14:02 |
Well, I'd like to hear Senthil's opinion, since he's the main http / urllib maintainer these days. |
|
|
msg169455 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-08-30 14:06 |
I'm inclined to think it isn't worth the effort, myself. I think a "validating" client would be a valuable tool, but that that isn't what the stdlib is focused on. But yes, let's hear Senthil's opinion. (That said, I am in fact adding a lot of validation capabilities to the email package...but in general those are only a matter of exposing the information about parsing failures that the code then has to recover from, rather than adding extra parsing to detect non-compliance.) |
|
|
msg227504 - (view) |
Author: karl (karlcow) * |
Date: 2014-09-25 04:12 |
Let's close this. >>> "HTTP/1.1 301 ".split(None, 2) ['HTTP/1.1', '301'] >>> "HTTP/1.1 301 ".split(' ', 2) ['HTTP/1.1', '', ' 301 '] I think it would be nice to have a way to warn without stopping, but the last comment from r.david.murray makes sense too. :) |
|
|
msg227587 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2014-09-26 07:04 |
Sorry that I did not get involved earlier. It is difficult to prove any problem with the current behavior and it is rightly closed. The issue which was originally raised seems to me a cosmetic one, which won't get exhibited as well. Here is simple test case and the output with the current behavior. # testcases.py testcases = ["HTTP/1.1 200", "HTTP/1.1 200 OK", "HTTP/1.1 200 ", "HTTP/1.1 200"] for tc in testcases: try: version, status, reason = tc.split(None, 2) print "%s (%s,%s,%s)" % (tc, version, status, reason) except ValueError: version, status = tc.split(None, 1) print "%s (%s, %s)" % (tc, version, status) $ python testcases.py HTTP/1.1 200 (HTTP/1.1, 200) HTTP/1.1 200 OK (HTTP/1.1,200,OK) HTTP/1.1 200 (HTTP/1.1, 200 ) HTTP/1.1 200 (HTTP/1.1, 200) The problem is the status code (str at the moment) has a trailing spaces. And now, the status code is not used as string. Just after the parsing, the status is converted to integer, Line 337: status = int(status) and rest of urllib and http/client's behavior use status as int rather than as string. Thanks! |
|
|