[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

Walter Dörwald walter at livinglogic.de
Mon Jan 11 11:37:56 CET 2010


On 09.01.10 14:38, Victor Stinner wrote:

Le samedi 09 janvier 2010 12🔞33, Walter Dörwald a écrit :

Good idea, I choosed open(filename, encoding="BOM").

On the surface this looks like there's an encoding named "BOM", but looking at your patch I found that the check is still done in TextIOWrapper. IMHO the best approach would to the implement a real codec named "BOM" (or "sniff"). This doesn't require any changes to the IO library. It could even be developed as a standalone project and published in the Cheeseshop. Why not, this is another solution to the point (2) (Check for a BOM while reading or detect it before?). Which encoding would be used if there is not BOM? UTF-8 sounds like a good choice.

UTF-8 might be a good choice, are the failback could be specified in the encoding name, i.e.

open("file.txt", encoding="BOM-UTF-8")

falls back to UTF-8, if there's no BOM at the start.

This could be implemented via a custom codec search function (see http://docs.python.org/library/codecs.html#codecs.register for more info).

Servus, Walter



More information about the Python-Dev mailing list