[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces (original) (raw)
Cameron Simpson cs at zip.com.au
Tue Apr 28 02:42:32 CEST 2009
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 27Apr2009 21:48, Martin v. L�wis <martin at v.loewis.de> wrote: | >>> There are still issues regarding how Windows and POSIX programs that | >>> are sharing cross-mounted file systems might communicate file names | >>> between each other, which is not at all clear from the PEP. If this | >>> is an insoluble or un-addressed issue, it should be stated. (It is | >>> probably insoluble, due to there being multiple ways that the | >>> cross-mounted file systems might translate names; but if there are, | >>> can we learn something from the rules the mounting systems use, to | >>> be compatible with (one of) them, or not. | >> | >> I'd say that's out of scope. A windows filesystem mounted on a UNIX host | >> should probably be mounted with a mapping to translate the Windows | >> Unicode names into whatever the sysadmin deems the locally most apt | >> byte encoding. But sys.getfilesystemencoding() is based on the current | >> user's locale settings, which need not be the same. | >> | > | > And if it were, what would it do with files that can't be encoded with | > the locally most apt byte encoding? || As Cameron says: it's out of the scope of the PEP. It really depends how | the operating system deals with them. Most likely, the files are not | accessible - not only not from Python, but also not accessible from | any other Unix program.
Well... If the files exist and the encoding of the mount software permits, there will be a sequence of bytes for the filename, and it will be accessible to a pure UNIX byte-speaking program. It will also be accessible from Python, because the os.* calls convert both ways: bytes->string an string->bytes as required. Martin's PEP just makes that lossless, which current it is not.
Conversely, if the mount software refuses to map the filename to a POSIX byte string, the file won't exist, or will refuse to be created. For a concrete example we have but to observe my macify program I was trying to counter the PEP with (I'm now a convert, btw). It is to run on a real UNIX system and recode filenames into UTF-8 NFD, prior to rsyncing to a Mac. Why? Because the MacOSX HFS filesystem refuses to accept byte strings not parsable by that encoding, and my music rsyncs were exploding, refusing to create files on the target Mac.
And there's probably some grey area where a dodgy mount software will present names that can't be used.
There's a supposed counter example in another followup post which I'll address there, since it seemed a little bogus to me.
I think that, almost independent of this PEP, there should be an os.fsencode() function that takes a byte string (as a POSIX OS call will take) and performs the same byte->string encoding that listdir() and friends are doing under the hood. And a partner os.fsdecode() for string->bytes. That will save a lot of wheel respoking and probably make it easier for people to think about this.
Aside: thinking on that, perhaps those functions should be in posix., or alternatively would a Windows system offer them in os. to produce native UTF-16 byte strings; useless for the WIndows API which cleanly takes unicode (I gather) but perhaps handy for people hacking filesystems directly or something like that. (Except I gather from a former existence that there is a multitude of on-disk filename encoding under WIndows depending how old your filesystems are and if they're FAT or NTFS, etc).
Cheers,
Cameron Simpson <cs at zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/
Your eyes are weary from staring at the CRT. You feel sleepy. Notice how restful it is to watch the cursor blink. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise. - gabrielh at tplrd.tpl.oz.au
- Previous message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Next message: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]