[Python-Dev] Quick sum up about open() + BOM (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Sat Jan 9 02:23:07 CET 2010


Antoine would like to check BOM by default, because both options (system locale vs checking for BOM) is the same thing.

To be clear, I am not saying it is the same thing. What I think is that it would be a mistake to use a mildly unreliable heuristic by default (the locale + device encoding heuristic) but refuse to trust a more reliable heuristic (the BOM-based detection algorithm). I concur. On Windows both UTF-8 and signature are very common, yet the platform default is the truly awful CP1252.

While I would support combining BOM detection in the case where a file is opened for reading and no encoding is specified, I see two problems: a) if a seek operations is performed before having looked at the BOM, no determination would have been made b) what encoding should it use on writing?

Regards, Martin



More information about the Python-Dev mailing list