[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Fri Jan 8 10:10:23 CET 2010
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Builtin open() function is unable to open an UTF-16/32 file starting with a BOM if the encoding is not specified (raise an unicode error). For an UTF-8 file starting with a BOM, read()/readline() returns also the BOM whereas the BOM should be "ignored".
It depends. If you use the utf-8-sig encoding, it will ignore the UTF-8 signature.
Since my proposition changes the result TextIOWrapper.read()/readline() for files starting with a BOM, we might introduce an option to open() to enable the new behaviour. But is it really needed to keep the backward compatibility?
Absolutely. And there is no need to produce a new option, but instead use the existing options: define an encoding that auto-detects the encoding from the family of BOMs. Maybe you call it encoding="sniff".
Regards, Martin
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]