[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader (original) (raw)

M.-A. Lemburg mal at egenix.com
Tue May 24 10:03:22 CEST 2011

Previous message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Next message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Victor Stinner wrote:

Hi,

In Python 2, codecs.open() is the best way to read and/or write files using Unicode. But in Python 3, open() is preferred with its fast io module. I would like to deprecate codecs.open() because it can be replaced by open() and io.TextIOWrapper. I would like your opinion and that's why I'm writing this email.

I think you should have moved this part of your email further up, since it explains the reason why this idea was rejected for now:

I opened an issue for this idea. Brett and Marc-Andree Lemburg don't want to deprecate codecs.open() & friends because they want to be able to write code working on Python 2 and on Python 3 without any change. I don't think it's realistic: nontrivial programs require at least the six module, and most likely the 2to3 program. The six module can have its "codecs.open" function if codecs.open is removed from Python 3.4.

And now for something completely different:

codecs.open() and StreamReader, StreamWriter and StreamReaderWriter classes of the codecs module don't support universal newlines, still have some issues with stateful codecs (like UTF-16/32 BOMs), and each codec has to implement a StreamReader and a StreamWriter class.

StreamReader and StreamWriter are stateless codecs (no reset() or setstate() method), and so it's not possible to write a generic fix for all child classes in the codecs module. Each stateful codec has to handle special cases like seek() problems. For example, UTF-16 codec duplicates some IncrementalEncoder/IncrementalDecoder code into its StreamWriter/StreamReader class.

Please read PEP 100 regarding StreamReader and StreamWriter. Those codecs parts were explicitly designed to be stateful, unlike the stateless encoder/decoder methods.

Please read my reply on the ticket:

""" StreamReader and StreamWriter classes provide the base codec implementations for stateful interaction with streams. They define the interface and provide a working implementation for those codecs that choose not to implement their own variants.

Each codec can, however, implement variants which are optimized for the specific encoding or intercept certain stream methods to add functionality or improve the encoding/decoding performance.

Both are essential parts of the codec interface.

TextIOWrapper and StreamReaderWriter are merely wrappers around streams that make use of the codecs. They don't provide any codec logic themselves. That's the conceptual difference. """

The io module is well tested, supports non-seekable streams, handles correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of newlines including an "universal newline" mode. TextIOWrapper reuses incremental encoders and decoders, so BOM issues were fixed only once, in TextIOWrapper.

It's trivial to replace a call to codecs.open() by a call to open(), because the two API are very close. The main different is that codecs.open() doesn't support universal newline, so you have to use open(..., newline='') to keep the same behaviour (keep newlines unchanged). This task can be done by 2to3. But I suppose that most people will be happy with the universal newline mode. I don't see which usecase is not covered by TextIOWrapper. But I know some cases which are not supported by StreamReader/StreamWriter.

This is a misunderstanding of the concepts behind the two.

StreamReader and StreamWriters are implemented by the codecs, they are part of the API that each codec has to provide in order to register in the Python codecs system. Their purpose is to provide a stateful interface and work efficiently and directly on streams rather than buffers.

Here's my reply from the ticket regarding using incremental encoders/decoders for the StreamReader/Writer parts of the codec set of APIs:

""" The point about having them use incremental codecs for encoding and decoding is a good one and would need to be investigated. If possible, we could use incremental encoders/decoders for the standard StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use the IncrementalEncode/Decoder per default.

Please open a new ticket for this. """

StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not used in the Python 3 standard library. I tried removed them: except tests of testcodecs which test them directly, the full test suite pass.

Read the issue for more information: http://bugs.python.org/issue8796

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, May 24 2011)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

2011-06-20: EuroPython 2011, Florence, Italy 27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Previous message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Next message: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list