[Python-Dev] transform() and untransform() methods, and the codec registry (original) (raw)

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Dec 7 06:57:43 CET 2010


On Tue, Dec 7, 2010 at 12:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

On Tue, Dec 7, 2010 at 2:46 PM, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote:

Having all encodings accessible in a str method only promotes a programming style where bytes objects can contain differently encoded strings in different parts of the program.  Instead, well-written programs should decode bytes on input, do all processing with str type and decode on output.  When strings need to be passed to char* C APIs, they should be encoded in UTF-8.  Many C APIs originally designed for ASCII actually produce meaningful results when given  UTF-8 bytes. (Supporting such usage was one of the design goals of UTF-8.) This world sounds nice, but it isn't the one that exists right now. Practicality beats purity and all that :)

.. and default encoding being fixed as UTF-8 already goes 99% of the way to that world. As long as I can use encode/decode without an argument, it does not bother me much that they can take one. These methods are also much easier to ignore than the transform/untransform pair simply because it is only one method per class. transform/untransform have much larger mental footprint not only because there are two of them in both str and bytes, but also because both str and bytes have a synonymously named translate method. With 43 non-special methods, str interface is already huge. The transform() method with a suitable set of codecs could possibly replace things like expandtabs() or swapcase(), but that would be like writing x.transform('exp') and x.unstransform('exp') instead of math.exp(x) and math.log(x).



More information about the Python-Dev mailing list