[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for fspath and os.fspath() (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Wed Apr 13 09:51:02 EDT 2016


On 13 April 2016 at 02:15, Ethan Furman <ethan at stoneleaf.us> wrote:

On 04/11/2016 04:43 PM, Victor Stinner wrote:

Le 11 avr. 2016 11:11 PM, "Ethan Furman" a écrit :

So my concern in such a case is what happens if we pass this SE string somewhere else: a UTF-8 file, or over a socket, or into a database? Does this have issues that we wouldn't face if we just used bytes?

"SE string" are returned by os.listdir(str), os.walk(str), os.getenv(str), sys.argv[int], ... since Python 3.3. Nothing new under the sun. So when we pass a bytes object in, Python (on posix) converts that to a string using surrogateescape, gets back strings from the os, and encodes them back to bytes, again using surrogateescape?

On POSIX, if you pass bytes to the os module, it will pass bytes to the underlying system API, and then pass bytes back to your application.

The potentially SE-strings only come back when you pass str, and the operating system data isn't properly encoded according to the nominal filesystem encoding. They round trip nicely to other operating system APIs, but can indeed be a problem if they escape to other parts of your program (hence ideas like http://bugs.python.org/issue18814#msg251694 and the preceding discussion in that issue)

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list