[Python-Dev] Unicode byte order mark decoding (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Wed Apr 6 22:22:19 CEST 2005


Stephen J. Turnbull wrote:

Because the signature/BOM is not a chunk, it's a header. Handling the signature/BOM is part of stream initialization, not translation, to my mind.

I'm sorry, but I'm losing track as to what precisely you are trying to say. You seem to be using a mental model that is entirely different from mine.

The point is that explicitly using a stream shows that initialization (and finalization) matter. The default can be BOM or not, as a pragmatic matter. But then the stream data itself can be treated homogeneously, as implied by the notion of stream.

But what follows from that point? So it shows some kind of matter... what does that mean for actual changes to Python API?

I think it probably also would solve Walter's conundrum about buffering the signature/BOM if responsibility for that were moved out of the codecs and into the objects where signatures make sense.

I don't know whether that's really feasible in the short run---I suspect there may be a lot of stream-like modules that would need to be updated---but it would be a saner in the long run.

What is "that" which might be really feasible? To "solve Walter's conundrum"? That "signatures make sense"?

So I can't really respond to your message in a meaningful way; I just let it rest...

Regards, Martin



More information about the Python-Dev mailing list