[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Fri Dec 3 10:16:04 CET 2010


On Thursday 02 December 2010 19:06:51 georg.brandl wrote:

Author: georg.brandl Date: Thu Dec 2 19:06:51 2010 New Revision: 86934

Log: #7475: add (un)transform method to bytes/bytearray and str, add back codecs that can be used with them from Python 2.

Oh no, someone did it. Was it really needed to reintroduce rot13 and friends?

I'm not strongly opposed to .transform()/.untranform() if it can be complelty separated to text encodings (ascii, latin9, utf-8 & cie). But str.encode() and bytes.decode() do accept transform codec names and raise strange error messages. Quote of Martin von Löwis (#7475):

"If the codecs are restored, one half of them becomes available to .encode/.decode methods, since the codec registry cannot tell which ones implement real character encodings, and which ones are other conversion methods. So adding them would be really confusing."

'abc'.transform('hex') TypeError: 'str' does not support the buffer interface b'abc'.transform('rot13') TypeError: expected an object with the buffer interface

b'abcd'.decode('hex') TypeError: decoder did not return a str object (type=bytes) 'abc'.encode('rot13') TypeError: encoder did not return a bytes object (type=str)

I don't like transform() and untransform() because I think that we should not add too much operations to the base types (bytes and str), and they do implicit module import. I prefer explicit module import (eg. import binascii; binascii.hexlify(b'to hex')). It remembers me PHP and it's ugly namespace with +5000 functions. I prefer Python because it uses smaller and more namespaces which are more specific and well defined. If we add email and compression functions to bytes, why not adding a web browser to the str?

Victor



More information about the Python-Dev mailing list