[Python-3000] Reversing through text files with the new IO library (original) (raw)

Guido van Rossum guido at python.org
Mon Mar 12 21🔞09 CET 2007


On 3/12/07, Mark Russell <mark.russell at zen.co.uk> wrote:

On 12 Mar 2007, at 17:56, Guido van Rossum wrote: > Thanks! This is a very interesting idea, I'd like to keep this > around somehow.

Thanks for the positive feedback - much appreciated. > I also see that you noticed a problem with text I/O in the current > design; there's no easy way to implement readline() efficiently. I > want readline() to be as efficient as possible -- "for line in " > should scream, like it does in 2.x. Yes, I suspect that BufferedReader needs some kind of readuntil() method, so that (at least for sane encodings like utf-8) each line is read via a single readuntil() followed by a decode() call for the entire line. Maybe something like this (although the only way to be sure is to experiment): line, endindex = buffer.readuntil(lineendings) Read until we see one of the byte strings in lineendings, which is a sequence of one or more byte strings. If there are multiple line endings with a common prefix, use the longest. Return the line complete with the ending, with endindex being the index within line of the line ending (or None if EOF was encountered). Is anyone working on io.py btw? If not I'd be willing to put some time into it. I guess the todo list is something like this:

I am, when I have time (which seems rarely) and Mike Verdone and Daniel Stutzbach are (though I may have unintentionally discouraged them by not providing feedback soon enough).

- Finish off the python prototypes in io.py (using and maybe tweaking the API spec)

Yes. I am positive that attempting to implement the entire PEP (and trying to do it relatively efficiently) will require us to go back to the API design several times.

Note that some of the binary prototypes don't work right yet; the unittests don't cover everything that's been implemented yet.

I would love for you to start working on this. Let me know off-line if you need more guidance (but CC Daniel and Mike so they know what's going on).

- Get unit tests working with builtin.open = io.open

I'm not even sure about this one; we may have to do that simultaneously with the str/unicode conversion. If we attempt do to it before then, I expect that we'll get lots of failures because the new I/O text layer always returns unicode and the new binary layer returns bytes objects. We may have to do it more piecemeal. Perhaps a good start would be to convert selected modules that use binary I/O to switch to the new io module explicitly by importing it and recoding them to deal with bytes.

- Profile and optimize (e.g. by selective conversion to C)

I'd be okay with doing that after the 3.0 alpha 1 release (planned for June).

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list