[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Glenn Linderman v+python at g.nevcal.com
Thu Apr 30 09:29:36 CEST 2009

Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On approximately 4/29/2009 8:46 PM, came the following characters from the keyboard of Terry Reedy:

Glenn Linderman wrote:

On approximately 4/29/2009 1:28 PM, came the following characters from

So where is the ambiguity here? None. But not everyone can read all the Python source code to try to understand it; they expect the documentation to help them avoid that. Because the documentation is lacking in this area, it makes your concisely stated PEP rather hard to understand. If you think a section of the doc is grossly inadequate, and there is no existing issue on the tracker, feel free to add one. Thanks for clarifying the Windows behavior, here. A little more clarification in the PEP could have avoided lots of discussion. It would seem that a PEP, proposed to modify a poorly documented (and therefore likely poorly understood) area, should be educational about the status quo, as well as presenting the suggested change. Where the PEP proposes to change, it should start with the status quo. But Martin's somewhat reasonable position is that since he is not proposing to change behavior on Windows, it is not his responsibility to document what he is not proposing to change more adequately. This means, of course, that any observed change on Windows would then be a bug, or at least a break of the promise. On the other hand, I can see that this is enough related to what he is proposing to change that better doc would help.

Yes; the very fact that the PEP discusses Windows, speaks about cross-platform code, and doesn't explicitly state that no Windows functionality will change, is confusing.

An example of how to initialize things within a sample cross-platform application might help, especially if that initialization only happens if the platform is POSIX, or is commented to the effect that it has no effect on Windows, but makes POSIX happy. Or maybe it is all buried within the initialization of Python itself, and is not exposed to the application at all. I still haven't figured that out, but was not (and am still not) as concerned about that as ensuring that the overall algorithms are functional and useful and user-friendly. Showing it might have been helpful in making it clear that no Windows functionality would change, however.

A statement that additional features are being added to allow cross-platform programs deal with non-decodable bytes obtained from POSIX APIs using the same code that already works on Windows, would have made things much clearer. The present Abstract does, in fact, talk only about POSIX, but later statements about Windows muddy the water.

Rationale paragraph 3, explicitly talks about cross-platform programs needing to work one way on Windows and another way on POSIX to deal with all the cases. It calls that a proposal, which I guess it is for command line and environment, but it is already implemented in both bytes and str forms for file names... so that further muddies the water.

It is, of course, easier to point out deficiencies in a document than to write a better document; however, it is incumbent upon the PEP author to write a PEP that is good enough to get approved, and that means making it understandable enough that people are in favor... or to respond to the plethora of comments until people are in favor. I'm not sure which one is more time-consuming.

I've reached the point, based on PEP and comment responses, where I now believe that the PEP is a solution to the problem it is trying to solve, and doesn't create ambiguities in the naming. I don't believe it is the best solution.

The basic problem is the overuse of fake characters... normalizing them for display results is large data loss -- many characters would be translated to the same replacement characters.

Solutions exist that would allow the use of fewer different fake characters in the strings, while still having a fake character as the escape character, to preserve the invariant that all the strings manipulated by python-escape from the PEP were, and become, strings containing fake characters (from a strict Unicode perspective), which is a nice invariant*. There even exist solutions that would use only one fake character (repeatedly if necessary), and all other characters generated would be displayable characters. This would ease the burden on the program in displaying the strings, and also on the user that might view the resulting mojibake in trying to differentiate one such string from another. Those are outlined in various emails in this thread, although some include my misconception that strings obtained via Unicode-enabled OS APIs would also need to be encoded and altered. If there is any interest in using a more readable encoding, I'd be glad to rework them to remove those misconceptions.

It would be nice to point out that invariant in the PEP, also.

-- Glenn -- http://nevcal.com/

A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list