[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)
M.-A. Lemburg mal at egenix.com
Fri Jan 8 17:25:22 CET 2010
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido van Rossum wrote:
On Fri, Jan 8, 2010 at 6:34 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
Victor Stinner <victor.stinner haypocalc.com> writes:
I wrote a new version of my patch (version 3): * don't change the default behaviour: use open(filename, encoding="BOM") to check the BOM is there is any Well, I think if we implement this the default behaviour should be changed. It looks a bit senseless to have two different "auto-choose" options, one with encoding=None and one with encoding="BOM". Well there are two different auto options: use the environment variables (LANG etc.) or inspect the contents of the file. I think it would be useful to have ways to specify both.
Shouldn't this encoding guessing be a separate function that you call on either a file or a seekable stream ?
After all, detecting encodings is just as useful to have for non-file streams. You'd then avoid having to stuff everything into a single function call and also open up the door for more complex application specific guess work or defaults.
The whole process would then have two steps:
- guess encoding
import codecs encoding = codecs.guess_file_encoding(filename)
- open the file with the found encoding
f = open(filename, encoding=encoding)
For seekable streams f, you'd have:
- guess encoding
import codecs encoding = codecs.guess_stream_encoding(f)
- wrap the stream with a reader for the found encoding
reader_class = codecs.getreader(encoding) g = reader_class(f)
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Jan 08 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]