[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for fspath and os.fspath() (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Thu Apr 14 20:20:42 EDT 2016
- Previous message (by thread): [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
- Next message (by thread): [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ethan Furman writes:
Substitute open() with sending those bytes somewhere else:
Eg, pathlib.Path, which will raise? Surely it should be safe to pass a DirEntry to a pathlib constructor? Note that having Path call fsdecode implicitly is a bad idea, because we don't know the provenance of generic bytes. But by design of fspath, its value (if str) is suitable for passing to Path, for further processing.
why should I have to reencode this str back to bytes, when bytes are what I asked for in the first place?
Erm, you didn't ask for bytes. You asked for whatever fspath is going to give you. And in many cases, like pathlib, it will be str. I imagine that doesn't bother you; you plan to use antipathy anyway. But if there's uptake on the protocol, I'll bet that str-only implementations are the majority.
And your question also cuts the other way. Why should I have to decode bytes to str, or suffer unexpected TypeErrors, or deal with the possibility of TypeErrors, just because fspath is polymorphic?
We're here to improve pathlib. There's been a huge amount of mission creep, with no use cases to provide intuition. You pit your abstract inconvenience against my 20 years of whack-a-mole with UnicodeErrors and TypeErrors in Mailman. I know that if you let bytes that represent text loose inside an application, eventually they'll end up in a str context and "blooey!"
How did this application get a bytes path object to begin with? Either it explicitly used bytes when calling scandir and friends (in which case it shouldn't be surprised to be working with bytes); or it got that bytes object from a database, over-the-wire, an-other-language-lib, etc.
No, it got it from an fspath-toting object (such as a DirEntry) it received from some library, which constructed it polymorphically from bytes it got from some other place -- and so lost the original encoding. That's the scenario I think is impossible to rule out, and reducing that kind of scenario to the bare minimum is why bytes got demoted from being the default representation of text in Python 3 in the first place.
If I'm working with bytes, why would I want to work with str?
First, are you actually working on those bytes, or are you just passing them to os functions? If the latter, you shouldn't care.
Second, because paths are conceptually text (you may not agree, but Nick inter alia has indicated he does). Working with bytes paths (except literals) is a good way to get in trouble, because there are all kinds of ways they can end up inappropriately encoded. For example, the odds are very high that a bytes path read from a file (including from a zipfile directory) in Japan will be encoded in Shift JIS. On Mac OS X, that will either produce mojibake in the directory (if the access creates the file) or fail to access the intended file, because the filesystem encoding is UTF-8.
Third, because you want to be portable to Windows, where you have no choice about whether paths are str or bytes.
These reasons probably don't apply to you with much strength, but the question is how typical you are, vs. the nearly universal experience of mojibake and the dominant market share of Windows.
Python is a glue language, and Python practitioners don't always have the luxury of working only with text.
For paths? Of course you can work with them as text. ISTM what you really want is the luxury of working only with bytes, because you're in the habit of pretending they are text. I don't object to you having your luxury as long as it doesn't increase risk for my use cases. I think you're asking for trouble, and the practice is definitely nonportable, but consenting adults applies.
However, the proposed polymorphism does create ambiguity and risk for my uses. I rarely have the luxury of not ensuring paths are text, regardless of the bytes-ness of the underlying application, because I can be pretty darn sure that somebody's going to feed me non- filesystem encodings, and soon. Even when I am working with bytes representing paths in the filesystem encoding, I need to convert to text to read the darn things when debugging! So I don't consent; you'll have to impose it on me.
- Previous message (by thread): [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
- Next message (by thread): [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]