[Python-Dev] Windows: Remove support of bytes filenames in the os module? (original) (raw)

Paul Moore p.f.moore at gmail.com
Mon Feb 8 13:26:32 EST 2016


On 8 February 2016 at 14:32, Victor Stinner <victor.stinner at gmail.com> wrote:

Since 3.3, functions of the os module started to emit DeprecationWarning when called with bytes filenames.

Everywhere? Or just on Windows? I can't tell from your email and I don't have a Unix system to hand to check.

The rationale is quite simple: Windows native type for filenames is Unicode, and the Windows has a weird behaviour when you use bytes. For example, os.listdir(b'.') gives you paths which cannot be used with open() on filenames which are not encodable the ANSI code page. Unencodable characters are replaced with "?". The following issue was opened to document this weird behaviour (but the doc was never completed):

"Document that bytes OS API can returns unusable results on Windows" http://bugs.python.org/issue16700

OK, that seems fine, but obviously of limited interest to Unix users who aren't worried about cross-platform portability :-)

When the new os.scandir() API was designed, I asked to not support bytes filenames since they are "broken by design". https://www.python.org/dev/peps/pep-0471/

Recently, an user complained that os.walk() doesn't work with bytes on Windows anymore: "Regression: os.walk now using os.scandir() breaks bytes filenames on windows" http://bugs.python.org/issue25911 Serhiy Storchaka just pushed a change to reintroduce support bytes support on Windows in os.walk(), but I would prefer to do the opposite: drop supports for bytes filenames on Windows.

But leave those APIs as Unix only? That seems like a regression, too (sure, the bytes APIs are problematic on Windows, but only for certain characters AIUI). Windows users currently using programs written using the bytes API (presumably originally intended for Unix where the bytes API was a deliberate choice), who don't hit any encoding issues currently, will see those programs broken for no reason other than "users using different character sets than you may have been hitting issues before". That seems like a weird justification to me...

Are we brave enough to force users to use the "right" type for filenames?

If it were all users I'd say it's worth considering. But practicality beats purity here IMO, and I feel that allowing people's code to be "portable by default" is a more important goal than enforcing encoding purity on a single platform.

Paul



More information about the Python-Dev mailing list