[Python-Dev] _socket efficiencies ideas (original) (raw)

Guido van Rossum guido@python.org
Tue, 08 Apr 2003 10:50:50 -0400


I have been in discussion recently with Martin v. Loewis about an idea I have been thinking about for a while to improve the efficiency of the connect method in the socket module. I posted the original suggestion to the python suggestions tracker on sourceforge as item 706392.

A bit of history and justification: I am doing a lot of work using python to develop almost-real-time distributed data acquisition and control systems from running laboratory apparatus. In this environment, I do a lot of sun-rpc calls as part of the vxi-11 protocol to allow TCP/IP access to gpib-like devices. As a part of this, I do a lot sock socket.connect() calls, often with the connections being quite transient. The problem is that the current python socket module makes a DNS call to try to resolve each address before connect is called, which if I am connecting/disconnecting many times a second results in pathological and gratuitous network activity. Incidentally, I am in the process of creating a sourceforge project, pythonlabtools (just approved this morning), in which I will start maintaining a repository of the tools I have been working on.

Are you sure that it tries make a DNS call even when the address is pure numeric? That seems a mistake, and if that's really happening, I think that is the part that should be fixed. Maybe in the _socket module, maybe in getaddrinfo().

My first solution to this, for which I submitted a patch to the tracker system (with guidance from Martin), was to create a wrapper for the sockaddr object, which one can create in advance, and when socket.connect() is called (actually when getsockaddrarg() is called by connect), results in an immediate connection without any DNS activity.

This solution solves part of the problem, but may not be the right final one. After writing this patch and verifying its functionality, I tried it in the real world. Then, I realized that for sun-rpc work, it wasn't quite what I needed, since the socket number may be changing each time the rpc request is made, resulting in a new address wrapper being needed, and thus DNS activity again. After thinking about what I have done with this patch, I would also like to suggest another change (for which I am also willing to submit the patch, which is quite simple): Consistent with some of the already extant glue in socket to handle addresses like , would there be any reason no to modify setipaddr() and getaddrinfo() so that if an address is prefixed with (e.g. 127.0.0.1) that the PASSIVE and NUMERIC flags are always set so these routines reject any non-numeric address, but handle numeric ones very efficiently? I have already implemented a predecessor to this which I am experimentally running at home in python 2.2.2, in which I made it so that prefixing the address with an exclamation point provided this functionality. Given the somewhat more legible approach the team has already chosen for special addresses, I see no reason why using a (or some such) prefix isn't reasonable. Do any members of the development team have commentary on this? Would such a change be likely to be accepted into the system? Any reasons which it might break something? The actual patch would be only about 10 lines of code, (plus some documentation), a few in each of the routines mentioned above.

I don't see why we would have to add the flag to the address when the form of the address itself is already a perfect clue that the address is purely numeric. I'd be happy to see a patch that intercepts addresses of the form \d+.\d+.\d+.\d+ and parses those without calling getaddrinfo().

--Guido van Rossum (home page: http://www.python.org/~guido/)