[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)
Walter Dörwald walter at livinglogic.de
Mon Jan 11 11:37:56 CET 2010
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 09.01.10 14:38, Victor Stinner wrote:
Le samedi 09 janvier 2010 12🔞33, Walter Dörwald a écrit :
Good idea, I choosed open(filename, encoding="BOM").
On the surface this looks like there's an encoding named "BOM", but looking at your patch I found that the check is still done in TextIOWrapper. IMHO the best approach would to the implement a real codec named "BOM" (or "sniff"). This doesn't require any changes to the IO library. It could even be developed as a standalone project and published in the Cheeseshop. Why not, this is another solution to the point (2) (Check for a BOM while reading or detect it before?). Which encoding would be used if there is not BOM? UTF-8 sounds like a good choice.
UTF-8 might be a good choice, are the failback could be specified in the encoding name, i.e.
open("file.txt", encoding="BOM-UTF-8")
falls back to UTF-8, if there's no BOM at the start.
This could be implemented via a custom codec search function (see http://docs.python.org/library/codecs.html#codecs.register for more info).
Servus, Walter
- Previous message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Next message: [Python-Dev] Improve open() to support reading file starting with an unicode BOM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]