[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

M.-A. Lemburg mal at egenix.com
Fri Jan 8 17:25:22 CET 2010


Guido van Rossum wrote:

On Fri, Jan 8, 2010 at 6:34 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

Victor Stinner <victor.stinner haypocalc.com> writes:

I wrote a new version of my patch (version 3): * don't change the default behaviour: use open(filename, encoding="BOM") to check the BOM is there is any Well, I think if we implement this the default behaviour should be changed. It looks a bit senseless to have two different "auto-choose" options, one with encoding=None and one with encoding="BOM". Well there are two different auto options: use the environment variables (LANG etc.) or inspect the contents of the file. I think it would be useful to have ways to specify both.

Shouldn't this encoding guessing be a separate function that you call on either a file or a seekable stream ?

After all, detecting encodings is just as useful to have for non-file streams. You'd then avoid having to stuff everything into a single function call and also open up the door for more complex application specific guess work or defaults.

The whole process would then have two steps:

  1. guess encoding

import codecs encoding = codecs.guess_file_encoding(filename)

  1. open the file with the found encoding

f = open(filename, encoding=encoding)

For seekable streams f, you'd have:

  1. guess encoding

import codecs encoding = codecs.guess_stream_encoding(f)

  1. wrap the stream with a reader for the found encoding

reader_class = codecs.getreader(encoding) g = reader_class(f)

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Jan 08 2010)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/



More information about the Python-Dev mailing list