[Python-3000] socket GC worries (original) (raw)

Guido van Rossum guido at python.org
Mon Oct 29 19:45:55 CET 2007


2007/10/28, Bill Janssen <janssen at parc.com>:

> Bill Janssen wrote: > > that whole mess of code is a good argument for not exposing the > > fileno in Python > > Seems to me that a socket should already be a file, > so it shouldn't need a makefile() method and you > shouldn't have to mess around with filenos.

That model fits TCP/IP streams just fine, but doesn't work so well for UDP and other odd socket types. The assumption that "s.write(a); s.write(b) is equivalent to s.write(a+b)", which is fundamental for any "stream" abstraction, just doesn't work for UDP. Ditto for reading: AFAIK recv() truncates the rest of an UDP packet.

I like that model, too. I also wish the classes in io.py were sort of inverted; that is, I'd like to have an IOStream base class with read() and write() methods (and maybe close()), which things like Socket could inherit from. FileIO would inherit from IOStream and from Seekable, and add a fileno() method and "name" property. And so forth. But apparently that's out; maybe in Python 4000.

Actually, I'm still up for tweaks to the I/O model if it solves a real problem, as long as most of the high-level APIs stay the same (there simply is too much code that expects those to behave a certain way).

I don't quite understand what you mean by inverted though.

Right now the socket is very much like an OS socket; with "send" and "recv" being the star players, not "read" and "write". socket.makefile wraps a buffered file-like interface around it.

I was going to say "we can just replace SocketIO with a non-seekable _fileio.FileIO instance" until I realized that on Windows, socket fds and filesystem fds live in different spaces and are managed using different calls. That may also explain why the inversion you're looking for doesn't quite work (IIUC what you meant).

The real issue seems to be file descriptor GC. Maybe we haven't written down the rules clearly enough for when the fd is supposed to be GC'ed, when there are both a socket and a SocketIO (or more) referencing it; and whether a close() call means something beyond dropping the last reference to the object. Or maybe we haven't implemented the rules right? ISTM that the SocketCloser class is intended to solve these issues. Back to your initial mail (which is more relevant than Greg Ewing's snipe!):

I think that the SocketCloser (new in Py3K) was developed to address another issue, which is that there's a lot of library code which assumes that the Python socket instance is just window dressing over an underlying system file descriptor, and isn't important. In fact, that whole mess of code is a good argument for not exposing the fileno in Python (perhaps only for special cases, like "select"). Take httplib and urllib, for instance. HTTPConnection creates a "file" from the socket, by calling socket.makefile(), then in some cases closes the socket (thereby reasonably rendering the socket dead), then returns the "file" to the caller as part of the response. urllib then takes the response, pulls the "file" out of it, and discards the rest, returning the "file" as part of an instance of addinfourl. Somewhere along the way some code should call "close()" on that HTTPConnection socket, but not till the caller is finished using the bytes of the response (and those bytes are kept queued up in the real OS socket). Ideally, GC of the response instance should call close() on the socket instance, which means that the instance should be passed along as part of the response, IMO.

Hm, I think you're right. The SocketCloser class wasn't written with the SSL use case in mind. :-( I wonder if one key to solving the problem isn't to make the socket wrap a low-level _socket instance instead of being one (i.e. containment instead of subclassing). Then the SSL code could be passed the low-level _socket instance and the high(er)-level socket class could wrap either a _socket or an SSL instance. The SocketCloser would then be responsible for closing whatever the socket instance wraps, i.e. either the _socket or the SSL instance. Then we could have any number of SocketIO instances plus at most one socket instance, and the wrapped thing would be closed when the last of the higher-level things was either GC'ed or explicitly closed. If you wanted to reuse the _socket after closing the SSL instance, you'd have to wrap it in a fresh socket instance.

Does that make sense? (Please do note the difference throughout between _socket and socket, the former being defined in socketmodule.c and the latter in socket.py.)

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list