[Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed to work? (original) (raw)

Martin v. Loewis martin@v.loewis.de
07 Aug 2002 08:46:59 +0200


Skip Montanaro <skip@pobox.com> writes:

I thought the whole purpose of the EncodedFile class was to provide transparent encoding.

""" Return a wrapped version of file which provides transparent
    encoding translation.

    Strings written to the wrapped file are interpreted according
    to the given data_encoding and then written to the original
    file as string using file_encoding. The intermediate encoding
    will usually be Unicode but depends on the specified codecs.

    Strings are read from the file using file_encoding and then
    passed back to the caller as string using data_encoding.

    If file_encoding is not given, it defaults to data_encoding.
"""

So, no. It provides transparent recoding: with a file encoding, and a data encoding.

I never found this class useful.

What you want is a StreamWriter:

f = codecs.get_writer('utf-8')(open('unicode-test', 'w'))

Of course, this specific case can be written much easier as

f = codecs.open('unicode-test', 'w', encoding = 'utf-8')

The get_writer case is useful if you already got a file-like object from somewhere.

Shouldn't it support transparent encoding of Unicode objects? That is, I told the system I want writes to be in utf-8 when I instantiated the class.

You told it also that input data are in utf-8, as you have omitted the data_encoding.

I don't think I should have to call .encode() directly. I realize I can wrap the function in a class that adds the transparency I desire, but it seems the whole point should be to make it easy to write Unicode objects to files.

Not this class, no.

Now, you may ask what else is the purpose of this class. I really don't know - it is against everything I'm advocating, as it assumes that you have byte strings in a certain encoding in your memory that you want to save in a different encoding. That should never happen - all your text data should be Unicode strings.

Regards, Martin