[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Paul Moore p.f.moore at gmail.com
Sat Apr 25 11:00:24 CEST 2009


2009/4/25 James Y Knight <foom at fuhm.net>:

On Apr 24, 2009, at 6:05 PM, Paul Moore wrote:

- Windows systems where broken Unicode (lone surrogates or whatever) isn't involved - Unix systems where the user's stated filesystem encoding is correct Can you honestly say that this isn't the vast majority of real-world environments? (IIRC, you are based in Japan, so it may well be true that the likelihood of problems is a lot higher where you are than where I am - the UK - but I suspect that averaging out, things are generally as above). In my experience, it is normal on most unix systems that some programs (mostly daemons) are running in default "POSIX" locale, others (most user programs) are running in the "enUS.utf-8" locale, and some luddite users have set themselves to "enUS.8859-1". All running on the same system.

OK, thanks for the data point.

Following on from that, would this (under Martin's proposal) result in programs receiving encoded strings, or just semantically-incorrect ones?

Specifically, the 8859-1 case cannot result in encoded strings, as 8859-1 can represent all byte strings (possibly garbled, but at least validly). The utf8 case can hit unrepresentable bytes, but only if there are characters greater than 0x7F in filenames. Is the "POSIX" case ASCII? If so, then the same logic (>=0x80 is unrepresentable).

So, the next question is - do people on such systems frequently use high-bit characters in filenames?

Paul.

PS Unfortunately, I suspect that the biggest group of people likely to be hit badly by this is people using non-latin scripts. And arguing probabilities without real data is optimistic at best. But those people are also the least likely people to contribute on an English-speaking list, I guess :-( (Sincere apologies if everyone but me on this list happens to actually be fluent English-speaking Russians :-))



More information about the Python-Dev mailing list