[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader (original) (raw)

M.-A. Lemburg mal at egenix.com
Tue May 24 12:14:10 CEST 2011


Victor Stinner wrote:

Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit :

Please read PEP 100 regarding StreamReader and StreamWriter. Those codecs parts were explicitly designed to be stateful, unlike the stateless encoder/decoder methods. Yes, it is possible to implement stateful StreamReader and StreamWriter classes and we have such codecs (I gave the example of UTF-16), but the state is not exposed (getstate / setstate), and so it's not possible to write generic code to handle the codec state in the base StreamReader and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0) for example.

So instead of always suggesting to deprecate everything, how about you come up with a proposal to add meaningful new methods to those base classes ?

Each codec can, however, implement variants which are optimized for the specific encoding or intercept certain stream methods to add functionality or improve the encoding/decoding performance. Can you give me some examples?

See the UTF-16 codec in the stdlib for example. This uses some of the available possibilities to interpret the BOM mark and then switches the encoder/decoder methods accordingly.

A lot more could be done for other variable length encoding codecs, e.g. UTF-8, since these often have problems near the end of a read due to missing bytes.

The base class implementation provides a general purpose implementation to cover the case, but it's not efficient, since it doesn't know anything about the encoding characteristics.

Such an implementation would have to be done per codec and that's why we have per codec StreamReader/Writer APIs.

TextIOWrapper and StreamReaderWriter are merely wrappers around streams that make use of the codecs. They don't provide any codec logic themselves. That's the conceptual difference. ... StreamReader and StreamWriters ... work efficiently and directly on streams rather than buffers. StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all have a file-like API: tell(), seek(), read(), readline(), write(), etc. The implementation is maybe different, but the API is just the same, and so the usecases are just the same. I don't see in which case I should use StreamReader or StreamWriter instead TextIOWrapper. I thought that TextIOWrapper is specific to files on disk, but TextIOWrapper is already used for other usages like sockets.

I have no idea why TextIOWrapper was added to the stdlib instead of making StreamReaderWriter more capable, since StreamReaderWriter had already been available in Python since Python 1.6 (and this is being used by codecs.open()).

Perhaps we should deprecate TextIOWrapper instead and replace it with codecs.StreamReaderWriter ? ;-)

Seriously, I don't see use of TextIOWrapper as an argument for removing StreamReader/Writer parts of the codecs API.

Here's my reply from the ticket regarding using incremental encoders/decoders for the StreamReader/Writer parts of the codec set of APIs:

""" The point about having them use incremental codecs for encoding and decoding is a good one and would need to be investigated. If possible, we could use incremental encoders/decoders for the standard StreamReader/Writer base classes or add new IncrementalStreamReader/Writer classes which then use the IncrementalEncode/Decoder per default. Why do you want to write a duplicate feature? TextIOWrapper is already here, it's working and widely used.

See above and please also try to understand why we have per-codec implementations for streams. I'm tired of repeating myself.

I would much prefer to see the codec-specific functionality in TextIOWrapper added back to the codecs where it belongs.

I am working on codec issues (like CJK encodings, see #12100, #12057, #12016) and I would like to remove StreamReader and StreamWriter to have less code to maintain.

If you want to add more code, will be available to maintain it? It looks like you are busy, some people (not me ;-)) are still waiting .transform()/.untransform()!

I dropped the ball on the idea after the strong wave of comments against those methods. People will simply have to use codecs.encode() and codecs.decode().

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, May 24 2011)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


2011-06-20: EuroPython 2011, Florence, Italy 27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/



More information about the Python-Dev mailing list