[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)

Paul Moore p.f.moore at gmail.com
Fri Apr 24 17:59:59 CEST 2009


2009/4/24 Antoine Pitrou <solipsis at pitrou.net>:

Aahz <aahz pythoncraft.com> writes:

The part that I haven't seen clearly addressed so far is what happens when disks get mounted across OSes (e.g. NFS). Unless there's some kind of native NFS API for file access, it is hopelessly out of scope for Python. We use whatever the C library exports to us, and don't have any control over filesystem details.

For "raw" level stuff (bytes on Unix, Unicode-nearly (:-)) on Windows) that's right. Resist the temptation to guess and all that.

For the level Martin is (as far as I can tell) aiming at [1], we need some defined rules on how to behave (relatively) sanely. Windows is fairly easy - "nearly-Unicode" to Unicode isn't too bad. But on Unix, you're dealing with bytes-to-Unicode in the absence of a clearly stated encoding - which is a known can of worms...

In my view:

The pros for Martin's proposal are a uniform cross-platform interface, and a user-friendly API for the common case. The cons are subtle and complex corner cases, and lack of agreement on the validity of the proposed encoding in those cases.

The fact that the bytes APIs won't go away probably mitigates the cons to a large extent (again, in my view...)

Paul.

[1] Actually, all the PEP says is "With this PEP, a uniform treatment of these data as characters becomes possible." An argument as to why this is a good thing would be a useful addition to the PEP. At the moment it's more or less treated as self-evident - which I agree with, but which clearly the Unix people here are not as certain of.



More information about the Python-Dev mailing list