[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed Oct 26 05:31:36 CEST 2011


In general I agree with what you write, Terry. One clarification and one comment, though.

Terry Reedy writes:

The doc says "All functions accepting path or file names accept both bytes and string objects, and result in an object of the same type, if a path or file name is returned." It does that now, though it says nothing about the encoding assumed for input bytes or used for output bytes.

That's determined by the OS, and figuring that out is the end user's problem.

It does not mention raising exceptions, so doing so is a feature-change that would likely break code. Currently, exceptional situations are signalled with "'?' in returned_path" rather than with an exception object. It ('?') is a bad choice of signal though, given the other uses of '?' in paths.

True, but this isn't really Python's problem. And IIUC Martin's post, it is hardly "exceptional": isn't Python doing this, it's just standard Windows behavior, which results in pathnames that are perfectly acceptable to Windows APIs, but unreliable in use because they have different semantics in different Windows APIs. If that is true, there are almost surely user programs that depend on this behavior, even though it sucks.[1]

My original "hearty +1" was dependent on my understanding from Victor's post that this substitution could cause later exceptions because filename is invalid (eg, contains illegal characters causing Windows to signal an error). If that's not true, I think the proper remedy is to add a strong warning to pylint that use of those APIs is supported (eg, for interaction with existing programs that use them) but that they require careful error-checking for robust use.

As a card-carrying Unicode nazi I wouldn't mind tagging the bytes APIs with a DeprecationWarning but I know that proposal is going nowhere so I withdraw it in advance.

Footnotes: [1] Note that the original rationale for this was surely "since users will have a very hard time using file names with this character in them, using it as a substitution character internally will make the problem evident and Sufficiently Smart Programs can deal with it."



More information about the Python-Dev mailing list