[Python-3000] suggestion for a new socket io stack (original) (raw)

Guido van Rossum guido at python.org
Sat Apr 29 00:36:49 CEST 2006


Good ideas. Sockets follow the C API to the letter; this was a good idea when I first created the socket extension module, but nowadays it doesn't feel right. Also, the socket.py module contains a lot of code (a whole buffering file implementation!) that ought to be subsumed into the new I/O stack.

The resolver APIs (gethostname() etc.) also need to be rethought.

And then there's the select module, which has a very close relationship with sockets (e.g. on Windows, only sockets are acceptable to select) -- and the new BSD kqueue API which ought to be provided when available. (Preferably there should be a way to wait for multiple sockets without having to decide whether to use select, poll or kqueue.)

And then there's the exception model. It's a pain that we have separate OSError, IOError and socket.error classes, and the latter doesn't even inherit from EnvironmentError. I propose that these all be unified (and the new class called IOError).

Perhaps a few people could get together and design a truly minimal socket API along the lines suggested here? The goal ought to be to support the same set of use cases as supported by the current socket and select modules but using more appropriate APIs.

Here's how it could interface to the new I/O stack: at the very bottom of the new I/O stack will be an object supporting unbuffered read and write operations. o.read(n) returns a bytes object of length at most n. o.write(b) takes a bytes object and writes some bytes, returning the number of bytes actually written. (There are also seek/tell/truncate ops but these aren't relevant here.) On top of this we can build buffering, encoding, and line ending translation (covering all the functionality we have in 2.x) but we could also build alternative functionality on top of buffered byte streams, for example record-based I/O.

The bottom of the stack maps directly onto Unix read/write system calls but also onto recv/send socket system calls (and on Windows, the filedescriptor spaces are different so a filedescriptor-based common cass can't work there). I wouldn't bother mapping datagram sockets to the I/O stack; datagrams ough to be handled on a packet basis.

--Guido

On 4/28/06, tomer filiba <tomerfiliba at gmail.com> wrote:

i've seen discussion about improving the IO stack of python, to rely less on the low-level lib-c implementation. so, i wanted to share my ideas in that niche.

i feel today's sockets are way outdated and overloaded. python's sockets are basically a wrapper for the low-level BSD sockets... but IMHO it would be much nicer to alleviate this dependency: expose a more high-level interface to socket programming. the BSD-socket methodology does not sit well with pythonic paradigms. let's start with {set/get}sockopt... that's one of the ugliest things in python, i believe most would agree. it's basically C programming in python. so, indeed, it's a way to overcome differences between platforms and protocols, but i believe it's not the way python should handle it. my suggestion is nothing "revolutionary". it's basically taking the existing socket module and extending it for most common use cases. there are two types of sockets, streaming and datagram. the underlying protocols don't matter. and these two types of sockets have different semantics to them: send/recv vs. sendto/recvfrom. so why not introduce a StreamSocket and DgramSocket types? and of course RawSocket should be introduced to completement them. you can argue that recvfrom and sendto can be used on streaming sockets as well, but did anyone ever use it? i never saw such code, and i can't think why you would want to use it. next, all the socket options would become properties or methods (i prefer properties). each protocol would subclass {Stream/Dgram/Raw}Socket and add its protocol-specific options. here's an example for a hierarchy: Socket RawSocket DgramSocket UDPSocket StreamSocket TCPSocket SSLSocket the above tree is only partial of course. but it needn't be complete, either. less used protocols, like X25 or ICMP could be constructed directly with the Socket class, in the old fashion of passing parameters. after all, the suggested class hierarchy only wraps the existing socket constructor and adds a more python API to its options. here's an example: s = TCPSocket(AFINET6) s.reuseaddress = True # this option is inherited from Socket s.nodelay = True # this is a TCP-level option s.bind(("", 12345)) s.listen(1) s2 = s.accept() s2.send("hello") or s = UDPSocket() s.allowbroadcast = True s.sendto("hello everybody", ("255.255.255.255", 12345)) perhaps we should consider adding an "options" namespace, in order to keep the root level of the instance simpler. for example: s.options.reuseaddress = True it clarifies that reuseaddress is an option. is it necessary? donno. and since we can override bind(), perhaps we should override it to provide a more specific interface, i.e. def bind(self, addr, port): super(self, ...).bind((addr, port)) because we know it's a tcp socket, so we don't need to retain support for all addressing forms: it's an IP address and a port. --- i would also want to replace the current BSD semantics for client sockets, of first creating a socket and then connecting it, i.e., s = socket() s.connect(("localhost", 80)) i would prefer s = ConnectedSocket(("localhost", 80)) because a connecting the socket is part of initiallizing it, hence it should be part of the class' constructor, and not a separate phase of the socket's life. perhaps the syntax should be s = TCPSocket.connect(("localhost", 80)) # or s = TCPSocket.connect("localhost", 80) # if we override connect() where .connect would be a classmethod, which returns a new instance of the class, connected to the server. of course DgramSockets don't need such a mechanism. i would like to suggest the same about connection-oriented server sockets, but the case with those is a little more complicated, and possibly asynchronous (select()ing before accept()ing), so i would retain the existing semantics. --- another thing i find quite silly is the way sockets behave on shutdown and in non-blocking mode. when the connection breaks, i would expect recv() to raise EOFError, or some sort of socket.error, instead of returning "". moreover, when i'm using a non-blocking recv(), and there's no data to return, i would expect "", not a socket.timeout exception. to sum it up: * no data = "" * connection breaks = EOFError the situation, however, is exactly the opposite. which is quite not intuitive or logical, and i remember having to write this code: def recv(s): try: data = s.recv(1000) if not data: # socket closed raise EOFError except socket.timeout: data = "" # timeout return data to accumulate data from non-blocking sockets, in a friendly way. so yeah, the libsocket version of recv returns 0 on EOF and -1 with some errno when there's no data, but the pythonic version shouldn't just copy this behavior -- it should translate it to pythonic standards. you have to remember that libsocket and the rest where written in the 80's, and are very platform-dependent. plus, C doesn't allow multiple return values or exceptions, so they had to do it this way. the question that should guide you is, "if you where to write pythonic sockets, how would they look?" rather than "how do i write a 1:1 wrapper for libsocket?" --- by the way, a little cleanup: * why does accept return a tuple? instead of newsock, sockname = sock.accept() why not do newsock = sock.accept() sockname = newsock.getsockname() i'm always having strange bugs because i forget accept gives me a tuple rather than just a socket... and you don't generally need the sockname, especially since you can get it later with getsockname. * the host-to-network functions, are they needed? can't you just use struct.pack and unpack? why not throw them away? what do you say?

-tomer


Python-3000 mailing list Python-3000 at python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list