[Python-Dev] _socket efficiencies ideas (original) (raw)
Marcus Mendenhall marcus.h.mendenhall@vanderbilt.edu
Tue, 8 Apr 2003 10:59:27 -0500
- Previous message: [Python-Dev] _socket efficiencies ideas
- Next message: [Python-Dev] _socket efficiencies ideas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for your prompt reply!
On Tuesday, April 8, 2003, at 09:50 AM, Guido van Rossum wrote:
I have been in discussion recently with Martin v. Loewis about an idea I have been thinking about for a while to improve the efficiency of the connect method in the socket module. I posted the original suggestion to the python suggestions tracker on sourceforge as item 706392.
A bit of history and justification: I am doing a lot of work using python to develop almost-real-time distributed data acquisition and control systems from running laboratory apparatus. In this environment, I do a lot of sun-rpc calls as part of the vxi-11 protocol to allow TCP/IP access to gpib-like devices. As a part of this, I do a lot sock socket.connect() calls, often with the connections being quite transient. The problem is that the current python socket module makes a DNS call to try to resolve each address before connect is called, which if I am connecting/disconnecting many times a second results in pathological and gratuitous network activity. Incidentally, I am in the process of creating a sourceforge project, pythonlabtools (just approved this morning), in which I will start maintaining a repository of the tools I have been working on. Are you sure that it tries make a DNS call even when the address is pure numeric? That seems a mistake, and if that's really happening, I think that is the part that should be fixed. Maybe in the socket module, maybe in getaddrinfo(). Yes, it seems to do this. It sets the PASSIVE flags, but that doesn't seem to be quite enough to prevent DNS activity, although the NUMERIC flag does the job. This is true, at least, in 2.3.x on MacOSX, and since the socket stuff is all the same, I suspect it is true on many Unixes. Note that this doesn't happen on the MacOS9 version, which provides its own socket interface through GUSI, which apparently is smart enough to handle it. My first solution to this, for which I submitted a patch to the tracker system (with guidance from Martin), was to create a wrapper for the sockaddr object, which one can create in advance, and when socket.connect() is called (actually when getsockaddrarg() is called by connect), results in an immediate connection without any DNS activity. This solution solves part of the problem, but may not be the right final one. After writing this patch and verifying its functionality, I tried it in the real world. Then, I realized that for sun-rpc work, it wasn't quite what I needed, since the socket number may be changing each time the rpc request is made, resulting in a new address wrapper being needed, and thus DNS activity again. After thinking about what I have done with this patch, I would also like to suggest another change (for which I am also willing to submit the patch, which is quite simple): Consistent with some of the already extant glue in socket to handle addresses like , would there be any reason no to modify setipaddr() and getaddrinfo() so that if an address is prefixed with (e.g. 127.0.0.1) that the PASSIVE and NUMERIC flags are always set so these routines reject any non-numeric address, but handle numeric ones very efficiently? I have already implemented a predecessor to this which I am experimentally running at home in python 2.2.2, in which I made it so that prefixing the address with an exclamation point provided this functionality. Given the somewhat more legible approach the team has already chosen for special addresses, I see no reason why using a (or some such) prefix isn't reasonable. Do any members of the development team have commentary on this? Would such a change be likely to be accepted into the system? Any reasons which it might break something? The actual patch would be only about 10 lines of code, (plus some documentation), a few in each of the routines mentioned above. I don't see why we would have to add the flag to the address when the form of the address itself is already a perfect clue that the address is purely numeric. I'd be happy to see a patch that intercepts addresses of the form \d+.\d+.\d+.\d+ and parses those without calling getaddrinfo(). Do we want this? The parser also then have to be modified when to handle numeric INET6 addresses, when they become popular. I actually did implement one of my trial versions this way, and it worked fine.
There is one minor issue, too. In urllib, there are some calls to getaddrinfo to get (for maybe no good reason), CNAMEs of addresses. I would like some way to tag an address with a very strong comment that it is what it is, and I would like all further processing disabled.
Also, a 'trial' parsing of an address for matching a a.b.c.d pattern each time is a lot more processor inensive than checking for at the beginning.
I am perfectly happy to implement it either way.
--Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] _socket efficiencies ideas
- Next message: [Python-Dev] _socket efficiencies ideas
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]