Issue 18144: FD leak in urllib2 (original) (raw)

While other issues already exist about this problem, this particular case is unlike other issues, and I didn't think it a good idea to merge with those.

Under some very specific circumstances (sending a POST request with more data than an unknown threshold), at least one socket remains unclosed after calling close() on urllib2.urlopen's returned file object.

While I marked the only versions I could confirm exhibit the issue, I believe this is an issue on all versions.

This started in pypy[0], although it applies to CPython as well (albeit the reference counting GC is less likely to delay closing of the FD as much as in pypy).

I'm attaching the same server used to trigger this issue in pypy, works the same with CPython.

To trigger the leak, open an interpreter and do this (copypaste from pypy, CPython does not cause the leak because decref immediately closes the leak, but it will issue a wraning if ran with -Wall). See pypy's issue tracker[0] for detilas.

import os, urllib2 req = """{"imp": [{"h": 50, "battr": ["9", "10", "12"], "api": 3, "w": 320, "instl": 0, "impid": "5d6dedf3-17bb-11e2-b5c0-1040f38b83e0"}]""" * 10 r = urllib2.Request("http://localhost:8000/bogus?src=1", req) u = urllib2.urlopen(r) v = u.read() u.close() os.system("ls -alh /proc/%d/fd/*" % os.getpid()) lrwx------ 1 claudiofreire users 64 Jun 4 15:08 /proc/26203/fd/0 -> /dev/pts/5 lrwx------ 1 claudiofreire users 64 Jun 4 15:08 /proc/26203/fd/1 -> /dev/pts/5 lrwx------ 1 claudiofreire users 64 Jun 4 15:08 /proc/26203/fd/2 -> /dev/pts/5 lrwx------ 1 claudiofreire users 64 Jun 4 15:08 /proc/26203/fd/3 -> socket:[2086998] lrwx------ 1 claudiofreire users 64 Jun 4 15:08 /proc/26203/fd/5 -> /dev/pts/5 lrwx------ 1 claudiofreire users 64 Jun 4 15:08 /proc/26203/fd/6 -> /dev/pts/5 0

[0] https://bugs.pypy.org/issue867

Confirmed that this happens when the server sends a chunked response, or sends a Content-Length header, but not when the server just sends “Connection: close”. So this looks like the same as Issue 19524, and my patch for that seems to fix the issue here.

Python 3 version of the demo code:

import os, urllib.request data = b"""{"imp": [{"h": 50, "battr": ["9", "10", "12"], "api": 3, "w": 320, "instl": 0, "impid": "5d6dedf3-17bb-11e2-b5c0-1040f38b83e0"}]""" * 10 req = urllib.request.Request("http://localhost:8000/bogus?src=1", data) resp = urllib.request.urlopen(req) v = resp.read() resp.close() os.system("ls -alh /proc/%d/fd/*" % os.getpid())

I can confirm the issue is in urllib's open: it fails to close() the HTTP connection, leaving it to the GC to do it.

If addinfourl (and friends) is altered to carry a reference to the HTTP connection and close it on close(), the leak is fixed.

I have a patch but it is incomplete (just a POC), it only handles the common case I use.