[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)
Olemis Lang olemis at gmail.com
Mon Jan 11 19:58:01 CET 2010
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Jan 7, 2010 at 4:10 PM, Victor Stinner <victor.stinner at haypocalc.com> wrote:
Hi,
Builtin open() function is unable to open an UTF-16/32 file starting with a BOM if the encoding is not specified (raise an unicode error). For an UTF-8 file starting with a BOM, read()/readline() returns also the BOM whereas the BOM should be "ignored". [...]
I had similar issues too (please read below ;o) ...
On Thu, Jan 7, 2010 at 7:52 PM, Guido van Rossum <guido at python.org> wrote:
I'm a little hesitant about this. First of all, UTF-8 + BOM is crazy talk. And for the other two, perhaps it would make more sense to have a separate encoding-guessing function that takes a binary stream and returns a text stream wrapping it with the proper encoding?
About guessing the encoding, I experienced this issue while I was developing a Trac plugin. What I was doing is as follows :
- I guessed the MIME type + charset encoding using Trac MIME API (it was a CSV file encoded using UTF-16)
- I read the file using
open
- Then wrapped the file using
codecs.EncodedFile
- Then used
csv.reader
... and still get the BOM in the first value of the first row in the CSV file.
{{{ #!python
mimetype 'utf-16-le' ef = EncodedFile(f, 'utf-8', mimetype) }}}
IMO I think I am +1 for leaving open
just like it is, and use module
codecs
to deal with encodings, but I am strongly -1 for returning
the BOM while using EncodedFile
(mainly because encoding is
explicitly supplied in ;o)
--Guido
CMIIW anyway ...
-- Regards,
Olemis.
Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/
Featured article:
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]