[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Tue May 24 02:08:49 CEST 2011


Hi,

In Python 2, codecs.open() is the best way to read and/or write files using Unicode. But in Python 3, open() is preferred with its fast io module. I would like to deprecate codecs.open() because it can be replaced by open() and io.TextIOWrapper. I would like your opinion and that's why I'm writing this email.

--

codecs.open() and StreamReader, StreamWriter and StreamReaderWriter classes of the codecs module don't support universal newlines, still have some issues with stateful codecs (like UTF-16/32 BOMs), and each codec has to implement a StreamReader and a StreamWriter class.

StreamReader and StreamWriter are stateless codecs (no reset() or setstate() method), and so it's not possible to write a generic fix for all child classes in the codecs module. Each stateful codec has to handle special cases like seek() problems. For example, UTF-16 codec duplicates some IncrementalEncoder/IncrementalDecoder code into its StreamWriter/StreamReader class.

The io module is well tested, supports non-seekable streams, handles correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of newlines including an "universal newline" mode. TextIOWrapper reuses incremental encoders and decoders, so BOM issues were fixed only once, in TextIOWrapper.

It's trivial to replace a call to codecs.open() by a call to open(), because the two API are very close. The main different is that codecs.open() doesn't support universal newline, so you have to use open(..., newline='') to keep the same behaviour (keep newlines unchanged). This task can be done by 2to3. But I suppose that most people will be happy with the universal newline mode.

I don't see which usecase is not covered by TextIOWrapper. But I know some cases which are not supported by StreamReader/StreamWriter.

--

I opened an issue for this idea. Brett and Marc-Andree Lemburg don't want to deprecate codecs.open() & friends because they want to be able to write code working on Python 2 and on Python 3 without any change. I don't think it's realistic: nontrivial programs require at least the six module, and most likely the 2to3 program. The six module can have its "codecs.open" function if codecs.open is removed from Python 3.4.

StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not used in the Python 3 standard library. I tried removed them: except tests of test_codecs which test them directly, the full test suite pass.

Read the issue for more information: http://bugs.python.org/issue8796

Victor



More information about the Python-Dev mailing list