[Python-Dev] Windows: Remove support of bytes filenames in the os module? (original) (raw)

Brett Cannon brett at python.org
Mon Feb 8 12:02:41 EST 2016


On Mon, 8 Feb 2016 at 06:33 Victor Stinner <victor.stinner at gmail.com> wrote:

Hi,

Since 3.3, functions of the os module started to emit DeprecationWarning when called with bytes filenames. The rationale is quite simple: Windows native type for filenames is Unicode, and the Windows has a weird behaviour when you use bytes. For example, os.listdir(b'.') gives you paths which cannot be used with open() on filenames which are not encodable the ANSI code page. Unencodable characters are replaced with "?". The following issue was opened to document this weird behaviour (but the doc was never completed): "Document that bytes OS API can returns unusable results on Windows" http://bugs.python.org/issue16700

When the new os.scandir() API was designed, I asked to not support bytes filenames since they are "broken by design". https://www.python.org/dev/peps/pep-0471/ Recently, an user complained that os.walk() doesn't work with bytes on Windows anymore: "Regression: os.walk now using os.scandir() breaks bytes filenames on windows" http://bugs.python.org/issue25911 Serhiy Storchaka just pushed a change to reintroduce support bytes support on Windows in os.walk(), but I would prefer to do the opposite: drop supports for bytes filenames on Windows. Are we brave enough to force users to use the "right" type for filenames? -- On Python 2, it wasn't possible to use Unicode for filenames, many functions fail badly with Unicode, especially when you mix bytes and Unicode. On Python 3, Unicode is the "natural" types, most Python functions prefer Unicode, and the PEP 383 (surrogateescape) allows to safetely use Unicode on UNIX even with undecodable filenames (invalid bytes are stored as Unicode surrogate characters).

If Unicode string don't work in Python 2 then what is Python 2/3 to do as a cross-platform solution if we completely remove bytes support in Python 3? Wouldn't that mean there is no common type between Python 2 & 3 that one can use which will work with the os module except native strings (which are difficult to get right)? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20160208/c7e674dc/attachment.html>



More information about the Python-Dev mailing list