[Python-Dev] mysterious hangs in socket code (original) (raw)
Jeremy Hylton jeremy@alum.mit.edu
Tue, 3 Sep 2002 17:53:46 -0400
- Previous message: [Python-Dev] Misc/NEWS (was: Two random and nearly unrelated ideas)
- Next message: [Python-Dev] mysterious hangs in socket code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I've been running a small, multi-threaded program to retrieve web pages today. The entire program appears to hang when I perform a slow DNS operation, even there is no application-level coordinate between the threads.
The motivation comes from http://www.python.org/sf/591349, but I ended up writing a similar small test script, which I've attached.
When I run this program with Python 2.1, it produces a steady stream of output -- urls and the time it took to load them. Most of the pages take less than a second, but some take a very long time.
If I run this program with Python 2.2 or 2.3, it produces little bursts of output, then pauses for a long time, then repeats.
I believe that the problem relates to DNS lookups, but not in a way I fully understand. If I connect gdb to any of the threads while the program is hung, it is always inside getaddrinfo(). My first realization was that the socketmodule stopped wrapping DNS lookups in By_BEGIN/END_ALLOW_THREADS calls when the IPv6 changes were integrated. But if I restore these calls -- see http://www.python.org/sf/604210 -- I don't see any change in behavior. The program still hangs periodically.
One possibility is that the Linux getaddrinfo() is thread-safe, but only by way of a lock that only allows one request to be outstanding at a time.
Not sure what the other possibilities are, but the current behavior is awful.
Jeremy
import httplib import Queue import random import sys import threading import time import traceback import urlparse
headers = {"Accept": "text/plain, text/html, image/jpeg, image/jpg, " "image/gif, image/png, /"}
class URLThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self._queue = queue
self._stopevent = threading.Event()
def stop(self):
self._stopevent.set()
def run(self):
while not self._stopevent.isSet():
self.fetch()
def fetch(self):
url = self._queue.get()
t0 = time.time()
try:
self._fetch(url)
except:
etype, value, tb = sys.exc_info()
L = ["Error occurred fetching %s\n" % url,
"%s: %s\n" % (etype, value),
]
L += traceback.format_tb(tb)
sys.stderr.write("".join(L))
t1 = time.time()
print url, round(t1 - t0, 2)
def _fetch(self, url):
parts = urlparse.urlparse(url)
host = parts[1]
path = parts[2]
h = httplib.HTTPConnection(host)
h.connect()
h.request("GET", path, headers=headers)
r = h.getresponse()
r.read()
h.close()
urls = """
http://www.andersen.com/
http://www.google.com/
http://www.google.com/images/logo.gif
http://www.microsoft.com/
http://www.microsoft.com/homepage/gif/bnr-microsoft.gif
http://www.microsoft.com/homepage/gif/1ptrans.gif
http://www.microsoft.com/library/toolbar/images/curve.gif
http://www.yahoo.com/
http://www.sourceforge.net/
http://www.slashdot.org/
http://www.kuro5hin.org/
http://www.intel.com/
http://www.aol.com/
http://www.amazon.com/
http://www.cnn.com/
http://money.cnn.com/
http://www.expedia.com/
http://www.tripod.com/
http://www.hotmail.com/
http://www.angelfire.com/
http://www.excite.com/
http://www.verisign.com/
http://www.riaa.com/
http://www.enron.com/
http://www.securityspace.com/
http://www.directv.com/
http://www.att.com/
http://www.qwest.com/
http://www.covad.com/
http://www.sprint.com/
http://www.mci.com/
http://www.worldcom.com/
"""
urls = [u for u in urls.split("\n") if u]
REPEAT = 10 THREADS = 8
class RandomQueue:
def __init__(self, L):
self.list = L
def get(self):
return random.choice(self.list)
if name == "main": urlq = RandomQueue(urls)
sys.setcheckinterval(10)
threads = []
for i in range(THREADS):
t = URLThread(urlq)
t.start()
threads.append(t)
while 1:
try:
time.sleep(30)
except:
break
print "Shutting down threads..."
for t in threads:
t.stop()
for t in threads:
t.join()
- Previous message: [Python-Dev] Misc/NEWS (was: Two random and nearly unrelated ideas)
- Next message: [Python-Dev] mysterious hangs in socket code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]