[Python-Dev] RE: Ill-defined encoding for CP875? (original) (raw)

Tim Peters tim.one@home.com
Sun, 13 May 2001 14:31:42 -0400


[M.-A. Lemburg]

... The "right" thing to do here, is to simply remove cp875 from the test for round-tripping.

I'm relieved you think so, since that's what I already did .

It is not the only encoding which fails this test, but it's not our fault: the codecs were all generated from the original codec maps at the Unicode.org site.

If their mappings are broken, we can't do much about it... other than to ignore the error or remove the codec altogether.

On general principle I don't like either of those -- "in the face of ambiguity, refuse the temptation to guess". It's at least surprising to see

unicode("?", "cp875").encode("cp875") '\xfd'

now, yes? Would it be better if an ambiguous encoding raised an exception in "strict" mode? That is, a third choice is to alert users when they're relying on a broken part of a mapping.