[Python-3000] Google Sprint Ideas (original) (raw)

Talin talin at acm.org
Mon Aug 21 03:34:28 CEST 2006

Previous message: [Python-3000] Google Sprint Ideas
Next message: [Python-3000] Google Sprint Ideas
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Guido van Rossum wrote:

On 8/20/06, Talin <talin at acm.org> wrote:

Guido van Rossum wrote: > On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:

> Without endorsing every detail of his design, tomer filiba has written > several blog (?) entries about this, the latest being > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look > at sandbox/sio/sio.py in svn. One comment after reading this: If we're going to re-invent the Java/C# i/o library, could we at least use the same terminology? In particular, the term "Layer" has connotations which may be confusing in this context - I would prefer something like "Adapter" or "Filter". That's an example of what I meant when I said "without endorsing every detail". I don't know which terminology C++ uses beyond streams. I think Java uses Streams for the lower-level stuff and Reader/Writer for the higher-level stuff -- or is it the other way around?

Well, the situation with Java is kind of complex. There are two sets of stream classes, but rather than classifying them as "low-level" and "high-level", a better classification is "old" and "new". The old classes (InputStream/OutputStream) are byte-oriented, whereas the newer ones (Reader/Writer) are character-oriented. It it not the case, however, that the character-oriented interface sits on top of the byte-oriented interface - rather, both interfaces are implemented by a number of different back ends.

For purposes of Python, it probably makes more sense to look at the .Net System.IO.Stream. (As a general rule, the .Net classes are refactored versions of the Java classes, which is both good and bad. It's best to study both if one is looking for inspiration.)

Hmmm, apparently the .Net documentation does use the term 'layer' to describe one stream wrapping another - which I still find strange. To my mind, the term 'layer' can either describe a particular design stratum within an architecture - such as the 'device layer' of an operating system - or it can describe a portion of a document, such as a drawing layer in a CAD program. I don't normally think of a single instance of a class wrapping another instance as constituting a "layer" - I usually use the term "adapter" or "proxy" to describe that case.

(OK, so I'm pedantic about naming. Now you know why one of my side projects is writing an online programmer's thesaurus -- using Python/TurboGears of course!)

Also, I notice that this proposal removes what I consider to be a nice feature of Python, which is that you can take a plain file object and iterate over the lines of the file -- it would require a separate line buffering adapter to be created. I think I understand the reasoning behind this - in a world with multiple text encodings, the definition of "line" may not be so simple. However, I would assume that the "built-in" streams would support the most basic, least-common-denominator encodings for convenience. First time I noticed that. But perhaps it's the concept of "plain file object" that changed? My own hierarchy (which I arrived at without reading tomer's proposal) is something like this: (1) Basic level (implemented in C) -- open, close, read, write, seek, tell. Completely unbuffered, maps directly to system calls. Does binary I/O only. (2) Buffering. Implements the same API as (1) but adds buffering. This is what one normally uses for binary file I/O. It builds on (1), but can also be built on raw sockets instead. It adds an API to inquire about the amount of buffered data, a flush() method, and ways to change the buffer size. (3) Encoding and line endings. Implements a somewhat different API, for reading/writing text files; the API resembles Python 2's I/O library more. This is where readline() and next() giving the next line are implemented. It also does newline translation to/from the platform's native convention (CRLF or LF, or perhaps CR if anyone still cares about Mac OS <= 9) and Python's convention (always \n). I think I want to put these two features (encoding and line endings) in the same layer because they are both text related. Of course you can specify ASCII or Latin-1 to effectively disable the encoding part. Does this make more sense?

I understood that much -- this is pretty much the way everyone does things these days (our own custom stream library at work looks pretty much like this too.)

The question I was wondering is, will the built-in 'file' function return an object of level 3?

-- Talin

Previous message: [Python-3000] Google Sprint Ideas
Next message: [Python-3000] Google Sprint Ideas
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list