[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)

R. David Murray rdmurray at bitdance.com
Fri Dec 3 16:11:29 CET 2010


On Fri, 03 Dec 2010 10:16:04 +0100, Victor Stinner <victor.stinner at haypocalc.com> wrote:

On Thursday 02 December 2010 19:06:51 georg.brandl wrote: > Author: georg.brandl > Date: Thu Dec 2 19:06:51 2010 > New Revision: 86934 > > Log: > #7475: add (un)transform method to bytes/bytearray and str, add back codecs > that can be used with them from Python 2.

Oh no, someone did it. Was it really needed to reintroduce rot13 and friends? I'm not strongly opposed to .transform()/.untranform() if it can be complelty separated to text encodings (ascii, latin9, utf-8 & cie). But str.encode() and bytes.decode() do accept transform codec names and raise strange error messages. Quote of Martin von Löwis (#7475): "If the codecs are restored, one half of them becomes available to .encode/.decode methods, since the codec registry cannot tell which ones implement real character encodings, and which ones are other conversion methods. So adding them would be really confusing." >>> 'abc'.transform('hex') TypeError: 'str' does not support the buffer interface >>> b'abc'.transform('rot13') TypeError: expected an object with the buffer interface

I find these 'buffer interface' error messages to be the most confusing error message I get out of Python3 no matter what context they show up in. I have no idea what they are telling me. That issue is more general than transform/untransform, but perhaps it could be fixed for transform/untransform in particular.

>>> b'abcd'.decode('hex') TypeError: decoder did not return a str object (type=bytes) >>> 'abc'.encode('rot13') TypeError: encoder did not return a bytes object (type=str)

These error messages make perfect sense to me. I think it is called "duck typing" :)

I don't like transform() and untransform() because I think that we should not add too much operations to the base types (bytes and str), and they do implicit module import. I prefer explicit module import (eg. import binascii; binascii.hexlify(b'to hex')). It remembers me PHP and it's ugly namespace with +5000 functions. I prefer Python because it uses smaller and more namespaces which are more specific and well defined. If we add email and compression functions to bytes, why not adding a web browser to the str?

As MAL says, the codec machinery is a general purpose tool. I think it, and the transform methods, are a useful level of abstraction over a general class of problems.

Please also recall that transform/untransform was discussed before the release of Python 3.0 and was approved at the time, but it just did not get implemented before the 3.0 release.

-- R. David Murray www.bitdance.com



More information about the Python-Dev mailing list