[Python-Dev] File system path encoding on Windows (original) (raw)

tritium-list at sdamon.com tritium-list at sdamon.com
Sun Aug 28 16:39:33 EDT 2016


-----Original Message----- From: Python-Dev [mailto:python-dev-bounces+tritium-_ _list=sdamon.com at python.org] On Behalf Of Steve Dower Sent: Wednesday, August 24, 2016 11:44 AM To: Stephen J. Turnbull <turnbull.stephen.fw at u.tsukuba.ac.jp> Cc: Nick Coghlan <ncoghlan at gmail.com>; Python Dev <python-_ _dev at python.org> Subject: Re: [Python-Dev] File system path encoding on Windows

On 23Aug2016 2150, Stephen J. Turnbull wrote: > Steve Dower writes: > > > * Stephen sees "no reason not to change locale.getpreferredencoding()" > > (default encoding for open()) at the same time with the same switches, > > while I'm not quite as confident. Do users generally specify an encoding > > these days? I know I always put utf-8 there. > > I was insufficiently specific. "No reason not to" depends on separate > switches for file system encoding and preferred encoding. That makes > things somewhat more complicated for implementation, and significantly > so for users. Yes, it does, but it's about the only possible migration path. I know Nick and Victor like the idea of a -X flag (or a direct -utf8 flag), but I prefer more specific environment variables: - PYTHONWINDOWSLEGACYSTDIO (for the console changes) - PYTHONWINDOWSLEGACYPATHENCODING (assuming getfilesystemencoding() is utf8) - PYTHONWINDOWSLEGACYLOCALEENCODING (assuming getpreferredencoding() is utf8)

Once you get to var lengths like that, arcane single character flags start looking preferable. How about "PYTHONWINLEGACY" to just turn it all on or off. If the code breaks on one thing, it obviously isn't written to use the other two, so might as well shut them all off.

I'm open to dropping "WINDOWS" from these if others don't think it's necessary. Swap "LEGACY" for "UNICODE" if we just offer the option to upgrade now rather than changing the default.

(I could also see the possibility of PYTHONWINDOWSSTRICT* options to use the default encoding but raise if decoding bytes fails - mbcs:strict rather than mbcs:replace. For utf-8 mode I'd want to use surrogatepass throughout, so it will always raise on invalid encoding, but there will be far fewer occurrences than for mbcs.) I'll transform my earlier post into a PEP (or probably three PEPs - one for each change), since it seems like the paper trail is going to be more valuable than discussing it now. Without an actual build to test it's pretty hard to evaluate the impact. Cheers, Steve


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium- list%40sdamon.com



More information about the Python-Dev mailing list