[Python-Dev] Improve open() to support reading file starting with an unicode BOM (original) (raw)

Walter Dörwald walter at livinglogic.de
Mon Jan 11 13:29:04 CET 2010


On 10.01.10 00:40, "Martin v. Löwis" wrote:

How does the requirement that it be implemented as a codec miss the point?

If we want it to be the default, it must be able to fallback on the current locale-based algorithm if no BOM is found. I don't think it would be easy for a codec to do that. Yes - however, Victor currently apparently doesn't want it to be the default, but wants the user to specify encoding="BOM". If so, it isn't the default, and it is easy to implement as a codec. FWIW, I agree with Walter that if it is provided through the encoding= argument, it should be a codec. If it is built into the open function (for whatever reason), it must be provided by some other parameter. Why not simply encoding=None? I don't mind. Please re-read Walter's message - it only said that if this is activated through encoding="BOM", then it must be a codec, and could be on PyPI. I don't think Walter was talking about the case "it is not activated through encoding='BOM'" at all.

However if this autodetection feature is useful in other cases (no matter how it's activated), it should be a codec, because as part of the open() function it isn't reusable.

The default value should provide the most useful behaviour possible. Forcing users to choose between two different autodetection strategies (encoding=None and another one) is a little insane IMO.

And encoding="mbcs" is a third option on Windows.

That wouldn't disturb me much. There are a lot of things in that area that are a little insane, starting with Microsoft Windows :-)

;)

Servus, Walter



More information about the Python-Dev mailing list