[Python-Dev] Bytes path support (original) (raw)
Chris Barker chris.barker at noaa.gov
Fri Aug 22 00:30:20 CEST 2014
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Aug 20, 2014 at 9:52 PM, Cameron Simpson <cs at zip.com.au> wrote:
On 20Aug2014 16:04, Chris Barker - NOAA Federal <chris.barker at noaa.gov> wrote:
So really, people treat them as
"bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-(and maybe a couple others)-is-ascii-compatible" As someone who fought long and hard in the surrogate-escape listdir() wars, and was won over once the scheme was thoroughly explained to me, I take issue with these assertions: they are bogus or misleading. Firstly, POSIX filenames are just byte strings. The only forbidden character is the NUL byte, which terminates a C string, and the only special character is the slash, which separates pathanme components.
so they are "just byte strings", oh, except that you can't have a null, and the "slash" had better be code 47 (and vice versa). How is that different than "bytes-in-some-arbitrary-encoding-where-at-least the-slash-character-is-ascii-compatible"?
(sorry about the "maybe a couple others", I was too lazy to do my research and be sure).
But my point is that python users want to be able to work with paths, and paths on posix are not strictly strings with a clearly defined encoding, but they are also not quite "just arbitrary bytes". So it would be nice if we could have a pathlib that would work with these odd beasts. I've lost track a bit as to whether the surrogate-escape solution allows this to all work now. If it does, then great, sorry for the noise.
Second, a bare low level program cannot do much more than pass them
around. It certainly can do things like compute their basename, or other path related operations.
only if you assume that pesky slash == 47 thing -- it's not much, but it's not raw bytes either.
The "bytes in some arbitrary encoding where at least the slash character
(and maybe a couple others) is ascii compatible" notion is completely bogus. There's only one special byte, the slash (code 47). There's no OS-level need that it or anything else be ASCII compatible. I think characterizations such as the one quoted are activately misleading.
code 47 == "slash" is ascii compatible -- where else did the 47 value come from?
I think we'd all agree it is nice to have a system where filenames are all Unicode, but since POSIX/UNIX predates it by decades it is a bit late to ignore the reality for such systems.
well, the community could have gone to "if you want anything other than ascii, make it utf-8 -- but always, we're all a bunch of independent thinkers.
But none of this is relevant -- systems in the wild do what they do -- clearly we all want Python to work with them as best it can.
There's no external "filesystem encoding" in the sense of something recorded in the filesystem that anyone can inspect. But there is the expressed locale settings, available at runtime to any program that cares to pay attention. It is a workable situation.
I haven't run into it, but it seem the folks that have don't think relying on the locale setting is the least bit workable. If it were, we woldn't be havin this discussion -- use the locale setting to decide how to decode filenames -- done.
Oh, and I reject Nick's characterisation of POSIX as "broken". It's
perfectly internally consistent. It just doesn't match what he wants. (Indeed, what I want, and I'm a long time UNIX fanboy.)
bug or feature? you decide. Internal consistency is a good start, but it punts the whole encoding issue to the client software, without giving it the tools to do it right. I call that "really hard to work with" if not broken.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140821/465e44b2/attachment.html>
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]