[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)
Lennart Regebro regebro at gmail.com
Mon Jan 11 18:27:01 CET 2010
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Jan 11, 2010 at 18:16, "Martin v. Löwis" <martin at v.loewis.de> wrote:
But an autodetect feature is not a codec. Sure it should be reusable, but making it a codec seems to be a weird hack to me. Well, the existing UTF-16 codec also is an autodetect feature (to detect the endianness), and I don't consider it a weird hack.
So the BOM codec should raise a UnicodeDecodeError if there is no BOM? Because that's what it would have to do, in that case, because it can't fall back on anything, it has to handle and implement all encodings that have a BOM. And is it then actually very useful? You would have to do a try/except first with encoding='BOM' and then encoding=None to get the fallback to the standard.
I must say that I find this whole thing pretty obvious. 'BOM' is not an encoding. Either there should be a method to get the encoding from the BOM, returning None of there isn't one, or open() should look at the BOM when you pass in encoding=None. Or both.
That covers all usecases, is easy and obvious. Either open(file=foo, encoding=None) or open(file, encoding=encoding_from_bom(file))
I can't see that open(file, encoding='BOM') has any benefit over this, covers any extra usecase and is clearer in any way. Instead it adds something confusing: An encoding that isn't an encoding.
-- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]