Issue 6791: httplib read status memory usage (original) (raw)

Created on 2009-08-28 08:32 by m.sucajtys, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
httplib.patch m.sucajtys,2009-08-28 08:32 patch
i6791_unittest.patch rosslagerwall,2010-12-15 09:46 py3k unit test for the issue
i6791_py3k.patch rosslagerwall,2010-12-15 09:47 py3k patch
httplinelength.patch pitrou,2010-12-17 19:41
Messages (18)
msg92027 - (view) Author: sumar (m.sucajtys) Date: 2009-08-28 08:32
During writing some code I discovered some behaviour of httplib. When we connect to host, which doesn’t respond with status line, but it just sending data, httplib may consume more and more memory, becouce when we execute h = httplib.HTTPConnection(‘host’) h.conect() h.request(‘GET’, ‘/’) r = h.getresponse() httplib tries to read one line from host. If host doesn’t send new line character (‘\n’), httplib reads more and more data. On my tests httplib could consume all of 4GB of memory and the python process was killed by oom_killer. The resolution is to limit maximum amount of data read on getting response. I have performed some test: I received 3438293 from hosts located in the network. The longest valid response line is HTTP/1.1 500 ( The specified Secure Sockets Layer (SSL) port is not allowed. ISA Server is not configured to allow SSL requests from this port. Most Web browsers use port 443 for SSL requests. )\r\n and it has 197 characters. In RFC2616 in section 6.1 we have: “The first line of a Response message is the Status-Line, consisting of the protocol version followed by a numeric status code and its associated textual phrase, with each element separated by SP characters. No CR or LF is allowed except in the final CRLF sequence. Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF (..)The Reason-Phrase is intended to give a short textual description of the Status-Code.” So limiting maximum status line length to 256 characters is a solution of this problem. It doesn’t break compatibility withc RFC 2616. My patch was written originally on python2.4, but I’ve tested it on python2.6: [ms@host python2.6]$ patch --dry-run -i /home/ms/httplib.patch patching file httplib.py Hunk #1 succeeded at 209 (offset 54 lines).
msg92030 - (view) Author: sumar (m.sucajtys) Date: 2009-08-28 09:43
I've also check patch against code in svn tree: wget http://svn.python.org/projects/python/trunk/Lib/httplib.py patch -p0 -i httplib.patch --dry-run patching file httplib.py Hunk #1 succeeded at 209 (offset 54 lines). Hunk #2 succeeded at 303 (offset 10 lines).
msg110923 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-20 17:04
Sumar, to get this moved forward could you please provide a unit test.
msg124010 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2010-12-15 09:46
Attached is a unit test which tests the issue. Unfortunately, since it uses the resource module to limit memory to a workable size, it will only work on Unix. The given patch appears to fix the issue well. I think this should be taken as a security issue (even if a rather odd one) since a malicious http server could be set up in place of the normal one and crash any http python clients that connect to it. Eg: Run: dd if=/dev/zero bs=10M count=1000 | nc -l 8888 And then: import httplib h = httplib.HTTPConnection('localhost', 8888) h.connect() h.request('GET', '/') r = h.getresponse() This should cause python to use up all the memory available.
msg124011 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2010-12-15 09:47
A py3k patch against revision 87228.
msg124021 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-15 15:07
First, I don't think the resource module needs to be used here. Second, I don't see why getcode() would return 200. If no valid response was received then some kind of error should certainly be raised, shouldn't it?
msg124029 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-15 16:55
By the way, looking at the code, readline() without any parameter is used all over http.client, so fixing only this one use case doesn't really make sense.
msg124030 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2010-12-15 17:00
That's true. Near the bottom of the code, it says: # The status-line parsing code calls readline(), which normally # get the HTTP status line. For a 0.9 response, however, this is # actually the first line of the body! Limiting the length of the status line would break 0.9 responses so maybe this issue should be closed?
msg124031 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-15 17:02
> That's true. Near the bottom of the code, it says: > > # The status-line parsing code calls readline(), which normally > # get the HTTP status line. For a 0.9 response, however, this is > # actually the first line of the body! > > Limiting the length of the status line would break 0.9 responses so maybe this issue should be closed? Well, the HTTP 1.0 RFC was filed in 1996 and HTTP 1.1 is most commonly used today. I don't think we need to support 0.9 anymore. I'll open a separate issue for ripping off 0.9 support, though.
msg124111 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-16 07:59
I just read the whole discussion and it seems that code was in place so that client can tolerant of a BAD HTTP 0.9 Server response. http://www.w3.org/Protocols/HTTP/OldServers.html Given that talks about removing HTTP/0.9 support (+1 to that), this issue will become obsolete. I too support removing HTTP/0.9. There are hardly any advantages in keeping it around.
msg124126 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-16 13:18
Well, removing 0.9 support doesn't make this obsolete, does it?
msg124130 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-16 13:54
On Thu, Dec 16, 2010 at 01🔞30PM +0000, Antoine Pitrou wrote: > Well, removing 0.9 support doesn't make this obsolete, does it? It does. Doesn't it? Because I saw in your patch that you fall back on HTTP 1.0 behaviour when the server does not return a status line and in which case a Exception will be raise and this issue won't be observed.
msg124132 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-16 14:02
> It does. Doesn't it? Because I saw in your patch that you fall back on > HTTP 1.0 behaviour when the server does not return a status line and > in which case a Exception will be raise and this issue won't be > observed. I don't think you understood the issue here. Calling readline() without a maximum length means the process memory potentially explodes, if the server sends gigabytes of data without a single "\n".
msg124138 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-16 17:06
On Thu, Dec 16, 2010 at 02:02:10PM +0000, Antoine Pitrou wrote: > I don't think you understood the issue here. Calling readline() without > a maximum length means the process memory potentially explodes, if the > server sends gigabytes of data without a single "\n". Yeah, I seem to have misunderstood the issue. Even if the response wa s an *invalid* one but it was huge data without \n, the readline call would just explode. - reading chunked response is doing a readline call too. Both this need to be addressed by having a limit on reading. I thought readline() is being called only when parsing headers which should almost always have CRLF (or at least LF) and thought valid responses always start with headers.
msg124247 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-17 18:18
Now that 0.9 client support has been removed, this can proceed (at least for 3.2).
msg124253 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-17 19:41
Here is a patch limiting line length everywhere in http.client, + tests (it also affects http.server since the header parsing routine is shared).
msg124295 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-12-18 17:00
In the morning, I had a comment on the patch wondering why read _MAXLENGH + 1 and then check for len of header > _MAXLENGH. Instead of just reading _MAXLENGH (and if the length matched rejecting). ( Looks like it did not go through). I think that either way is okay. I am taking the privilege of committing the patch. Fixed for py3k in 87373. So it is be available in the next beta. Shall merge the changes to other codelines.
msg124302 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-18 18:21
Partially backported in r87382 (3.1) and r87383 (2.7). Not everything could be merged in because of HTTP 0.9 support and (in 2.7) a slightly different architecture. Thank you.
History
Date User Action Args
2022-04-11 14:56:52 admin set github: 51040
2010-12-18 18:21:23 pitrou set status: open -> closedmessages: + nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallstage: patch review -> resolved
2010-12-18 17:00:09 orsenthil set resolution: fixedmessages: + nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwall
2010-12-17 19:41:30 pitrou set files: + httplinelength.patchmessages: + nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallstage: needs patch -> patch review
2010-12-17 18🔞58 pitrou set nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallmessages: +
2010-12-16 17:06:15 orsenthil set nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallmessages: +
2010-12-16 14:02:08 pitrou set nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallmessages: +
2010-12-16 13:54:35 orsenthil set nosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallmessages: +
2010-12-16 13🔞27 pitrou set status: pending -> opennosy:orsenthil, pitrou, vstinner, m.sucajtys, rosslagerwallmessages: +
2010-12-16 07:59:30 orsenthil set status: open -> pendingnosy: - BreamoreBoymessages: +
2010-12-15 17:02:37 pitrou set nosy:orsenthil, pitrou, vstinner, m.sucajtys, BreamoreBoy, rosslagerwallmessages: +
2010-12-15 17:00:59 rosslagerwall set nosy:orsenthil, pitrou, vstinner, m.sucajtys, BreamoreBoy, rosslagerwallmessages: +
2010-12-15 16:55:31 pitrou set nosy:orsenthil, pitrou, vstinner, m.sucajtys, BreamoreBoy, rosslagerwallmessages: + stage: test needed -> needs patch
2010-12-15 15:07:44 pitrou set nosy: + pitroumessages: +
2010-12-15 09:47:48 rosslagerwall set files: + i6791_py3k.patchnosy:orsenthil, vstinner, m.sucajtys, BreamoreBoy, rosslagerwallmessages: +
2010-12-15 09:46:16 rosslagerwall set files: + i6791_unittest.patchnosy: + rosslagerwallmessages: +
2010-07-20 17:28:44 orsenthil set assignee: orsenthilnosy: + orsenthil
2010-07-20 17:04:49 BreamoreBoy set versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 2.5, Python 2.4nosy: + BreamoreBoymessages: + stage: test needed
2009-09-14 10:58:54 vstinner set nosy: + vstinner
2009-08-28 09:43:02 m.sucajtys set messages: +
2009-08-28 08:32:35 m.sucajtys create