[Python-3000] Draft PEP for New IO system (original) (raw)

Adam Olsen rhamph at gmail.com
Sun Mar 4 23:42:17 CET 2007


On 3/4/07, Daniel Stutzbach <daniel.stutzbach at gmail.com> wrote:

On 3/1/07, Adam Olsen <rhamph at gmail.com> wrote: > Why do non-blocking operations need to use the same methods when > they're clearly not the same semantics? Although long, > .nonblockflush() would be explicit and allow .flush() to still block.

.nonblockflush() would be fine with me, but I don't think .flush() should block on a non-blocking object. To accomplish that, it would either have to be smart enough to switch the object into blocking mode, or internally use select().

Either would work, if we decide to support it.

How about .flush() write as much as it can, and throw an exception if not all of the bytes can be written to the device? (again, this would only come up when a user has set the underlying file descriptor to non-blocking mode)

I see little point in having an interface that randomly fails depending on the stars, phase of the moon, etc. If the application is using the interface wrong then we should fail every time.

Errors should never pass silently.

> I'm especially wary of infinite buffers. They allow a malicious peer > to consume all your memory, DoSing the process or even the whole box > if Linux's OOM killer doesn't kick in fast enough.

For a write-buffer, you start eating up memory only if an application is buffer-ignorant and tries to dump a massive amount of data to the socket all at once. A naive HTTP server implementation might do this by calling something like s.write(open(filename)). This isn't a DoS by a peer though, it's a local implementation problem.

Any application expecting a blocking file and getting a non-blocking one is buffer-ignorant. How is this odd way of failing useful?

For a read-buffer, you start eating up all of memory only if you call .read() with no arguments and the peer dumps a few gig on you. If you call read(n) to get as much data as you need, then the buffer class will only grab reasonably sized chunks from the network. Network applications don't normally call .read() without arguments, since they need to communicate both ways. If the object is an ordinary file, then DoS isn't so much of an issue and reading the whole files seems very reasonable.

I suppose for an Text object wrapped around a socket, .readline() could be dangerous if a malicious peer sends a few gig all on one line. That's a problem for the encoding layer to sort out, not the buffering layer though.

A networked application should never read an unlimited amount of a socket, it should always used fixed-size blocks or fixed-size lines.

The rare application that requires processing all of the contents at once should first write it to disk (which has a much larger capacity), then read back in only limited amount at a time.

I can see three different behaviours when reading from a file or socket:

Both blocking modes are very similar, differing only in their default (read all vs read whatever) and their handling of the end of the file. I'm not convinced combining them (as python has traditionally done) is optimal, but it's not a big deal.

The non-blocking, chunked mode is very different however. It can return a short read at any point. Applications expecting blocking mode may get empty strings (or exceptions indicating such, perhaps the only saving grace.)

Using .nonblockingread() is long, but I expect it to be wrapped by the event loop anyway.

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-3000 mailing list