[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader (original) (raw)

Walter Dörwald walter at livinglogic.de
Tue May 24 12:16:49 CEST 2011

Previous message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Next message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 24.05.11 02:08, Victor Stinner wrote:

[...] codecs.open() and StreamReader, StreamWriter and StreamReaderWriter classes of the codecs module don't support universal newlines, still have some issues with stateful codecs (like UTF-16/32 BOMs), and each codec has to implement a StreamReader and a StreamWriter class.

StreamReader and StreamWriter are stateless codecs (no reset() or setstate() method),

They are stateful, they just don't expose their state to the public.

and so it's not possible to write a generic fix for all child classes in the codecs module. Each stateful codec has to handle special cases like seek() problems.

Yes, which in theory makes it possible to implement shortcuts for certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the character position by 4 to get the byte position). However AFAICR none of the readers/writers does that.

For example, UTF-16 codec duplicates some IncrementalEncoder/IncrementalDecoder code into its StreamWriter/StreamReader class.

Actually it's the other way round: When I implemented the incremental codecs, I copied code from the StreamReader/StreamWriter classes.

The io module is well tested, supports non-seekable streams, handles correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of newlines including an "universal newline" mode. TextIOWrapper reuses incremental encoders and decoders, so BOM issues were fixed only once, in TextIOWrapper.

It's trivial to replace a call to codecs.open() by a call to open(), because the two API are very close. The main different is that codecs.open() doesn't support universal newline, so you have to use open(..., newline='') to keep the same behaviour (keep newlines unchanged). This task can be done by 2to3. But I suppose that most people will be happy with the universal newline mode. I don't see which usecase is not covered by TextIOWrapper. But I know some cases which are not supported by StreamReader/StreamWriter.

This could be be partially fixed by implementing generic StreamReader/StreamWriter classes that reuse the incremental codecs, but I don't think thats worth it.

[...]

Servus, Walter

Previous message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Next message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list