[Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8 (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Sat Sep 3 12:27:44 EDT 2016
- Previous message (by thread): [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8
- Next message (by thread): [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4 September 2016 at 00:49, Nick Coghlan <ncoghlan at gmail.com> wrote:
On 2 September 2016 at 08:31, Steve Dower <steve.dower at python.org> wrote:
This proposal would remove all use of the *A APIs and only ever call the *W APIs. When Windows returns paths to Python as str, they will be decoded from utf-16-le and returned as text (in whatever the minimal representation is). When Windows returns paths to Python as bytes, they will be decoded from utf-16-le to utf-8 using surrogatepass (Windows does not validate surrogate pairs, so it is possible to have invalid surrogates in filenames). Equally, when paths are provided as bytes, they are decoded from utf-8 into utf-16-le and passed to the *W APIs. The overall proposal looks good to me, there's just a terminology glitch here: utf-8 <-> utf-16-le should either be described as transcoding, or else as decoding and then re-encoding. As they're both text codecs, there's no "decoding" operation that switches between them.
After also reading the Windows console encoding PEP, I realised there's a couple of missing discussions here regarding the impacts on sys.argv, os.environ, and os.environb.
The reason that's relevant is that "sys.getfilesystemencoding" is a bit of a misnomer, as it's also used to determine the assumed encoding of command line arguments and environment variables.
With the PEP currently stating that all use of the "*A" Windows APIs will be removed, I'm guessing these will just start working as expected, but it should be convered explicitly.
In addition, if the subprocess module is going to be excluded from these changes, that should be called out explicitly (Keeping in mind that on *nix, the only subprocess pipe configurations that are straightforward to set up in Python 3 are raw binary mode and universal newlines mode, with the latter implicitly treating the pipes as UTF-8 text)
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message (by thread): [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8
- Next message (by thread): [Python-Dev] PEP 529: Change Windows filesystem encoding to UTF-8
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]