[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Fri Jan 8 10:10:23 CET 2010


Builtin open() function is unable to open an UTF-16/32 file starting with a BOM if the encoding is not specified (raise an unicode error). For an UTF-8 file starting with a BOM, read()/readline() returns also the BOM whereas the BOM should be "ignored".

It depends. If you use the utf-8-sig encoding, it will ignore the UTF-8 signature.

Since my proposition changes the result TextIOWrapper.read()/readline() for files starting with a BOM, we might introduce an option to open() to enable the new behaviour. But is it really needed to keep the backward compatibility?

Absolutely. And there is no need to produce a new option, but instead use the existing options: define an encoding that auto-detects the encoding from the family of BOMs. Maybe you call it encoding="sniff".

Regards, Martin



More information about the Python-Dev mailing list