[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue Apr 28 06:26:36 CEST 2009


Michael Foord writes:

The problem you don't address, which is still the reality for most programmers (especially Mac OS X where filesystem encoding is UTF 8), is that programmers are going to treat filenames as strings.

The proposed PEP allows that to work for them - whatever platform their program runs on.

Sure, for values of "work" == "No exception will be raised in my module, and some content will actually be returned." It doesn't say anything about what happens once those strings escape the immediate context. So it encourages those programmers to pass any problems downstream, but only after discarding the resources needed to deal with problems effectively.

It's not that hard to overcome that problem, but it does require a slightly more complex API, and one that doesn't return a string but rather a stringlike object annotated with the information about how it was decoded. Conversion to a string should be trivial; I just think it should be invoked explicitly to make it clear where information is being discarded. Without an implicit conversion, the nature of the data (ie, context-dependent structure) is made explicit. There's a natural place to document the problem that context must be used to interpret the data accurately, and even add more robust processing (in a new PEP, of course!), etc.

Then in the future this interface could be used as the basis of a more robust API. With good design (and luck) it might be subclassible or extensible to a path object API, for example. PEP 383 on the other hand is a dead end as it stands. AFAICS it gives the best possible treatment of conversion of OS data to plain string, but we're already got developers lining up to say "I can't use it". :-(



More information about the Python-Dev mailing list