[Python-Dev] Windows: Remove support of bytes filenames in theos module? (original) (raw)
Victor Stinner victor.stinner at gmail.com
Wed Feb 10 05:37:58 EST 2016
- Previous message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in theos module?
- Next message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in theos module?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2016-02-10 11:18 GMT+01:00 Steven D'Aprano <steve at pearwood.info>:
[steve at ando ~]$ python3.3 -c 'print(open(b"/tmp/abc\xD8\x01", "r").read())' Hello World
[steve at ando ~]$ python3.3 -c 'print(open("/tmp/abc\xD8\x01", "r").read())' Traceback (most recent call last): File "", line 1, in FileNotFoundError: [Errno 2] No such file or directory: '/tmp/abcØ\x01' What Unicode string does one need to give in order to open file b"/tmp/abc\xD8\x01"?
Use os.fsdecode(b"/tmp/abc\xD8\x01") to get the filename as an Unicode string, it will work.
Removing 'b' in front of byte strings is not enough to convert an arbitrary byte strings to Unicode :-D Encodings are more complex than that... See http://unicodebook.readthedocs.org/
The problem on Python 2 is that the UTF-8 encoders encode surrogate characters, which is wrong. You cannot use an error handler to choose how to handle these surrogate characters.
On Python 3, you have a wide choice of builtin error handlers, and you can even write your own error handlers. Example with Python 3.6 and its new "namereplace" error handler.
def formatfilename(filename, encoding='ascii', errors='backslashreplace'): ... return filename.encode(encoding, errors).decode(encoding) ...
print(formatfilename(os.fsdecode(b'abc\xff'))) abc\udcff
print(formatfilename(os.fsdecode(b'abc\xff'), errors='replace')) abc?
print(formatfilename(os.fsdecode(b'abc\xff'), errors='ignore')) abc
print(formatfilename(os.fsdecode(b'abc\xff') + "é", errors='namereplace')) abc\udcff\N{LATIN SMALL LETTER E WITH ACUTE}
My locale encoding is UTF-8.
Victor
- Previous message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in theos module?
- Next message (by thread): [Python-Dev] Windows: Remove support of bytes filenames in theos module?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]