[Python-3000] revamping the io stack, part 2 (original) (raw)

Brett Cannon brett at python.org
Sat Apr 29 22:50:39 CEST 2006


On 4/29/06, tomer filiba <tomerfiliba at gmail.com> wrote:

i first thought on focusing on the socket module, because it's the part that bothers me most, but since people have expressed their thoughts on completely revamping the IO stack, perhaps we should be open to adopting new ideas, mainly from the java/.NET world (keeping the momentum from the previous post).

there is an inevitable issue of performance here, since it basically splits what used to be "file" or "socket" into many layers... each adding additional overhead, so many parts should be lowered to C. if we look at java/.NET for guidance, they have come up with two concepts:

I am a little weary of taking too much from Java/.NET since I have always found the I/O system way too heavy for the common case. I can't remember what it takes to get a reader in Java in order to read by lines. In Python, I love that I don't have to think about that; just pass a file object to 'for' and I am done.

While I am all for allowing for more powerful I/O through stacking a stream within various readers (which feels rather functional to me, but that must just be because of my latest reading material), I don't want to make the 90% case require hardly any memorizing of what readers I need in what order.

* stream - an arbitrary, usually sequential, byte data source * readers and writers - the way data is encoded into/decoded from the stream. we'll use the term "codec" for these readers and writers in general.

so "stream" is the "where" and "codec" is the "how", and the concept of codecs is not limited to ASCII vs UTF-8. it can grow into fully-fledged protocols. [SNIP - a whole lot of detailed ideas] ----- buffering is always explicit and implemented at the interpreter level, rather than by libc, so it is consistent between all platforms and streams. all streams, by nature, and non-buffered (write the data as soon as possible). buffering wraps an underlying stream, making it explicit class BufferedStream(Stream): def init(self, stream, bufsize) def flush(self) (BufferedStream appears in .NET) class LineBufferedStream(BufferedStream): def init(self, stream, flushon = b"\n") f = LineBufferedStream(FileStream("c:\blah")) where flushon specifies the byte (or sequence of bytes?) to flush upon writing. by default it would be on newline.

See, this is what I am worried about. I really like not having to figure out what I need to do to read by lines from a file. If the FileStream object had an iter that did the proper wrapping with LinedBufferedStream, then great, I'm happy. But if we do not add some reasonable convenience functions or iterators, this is going to feel rather heavy-handed rather quickly.

-Brett



More information about the Python-3000 mailing list