Issue 8775: Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.) (original) (raw)

Created on 2010-05-20 12:09 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cmdline_encoding.patch vstinner,2010-06-18 23:42
Messages (9)
msg106139 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-20 12:09
The file system is hardcoded to UTF-8 on Mac OS X, whereas the locale encoding... depends on the locale. See issue #4388 for the details. I think that we should use the locale encoding to encode and decode command line arguments. We have to create a new encoding variable used for the command line arguments: * Py_CommandLineEncoding * sys.getcmdlineencoding() * (no sys.setcmdlineencoding() please!) * ... This encoding only should be used on POSIX: Windows native type is unicode (wchar_t*). It should be used to decode sys.argv and to encode child processes arguments (subprocess, os.exec*(), etc.)). On Linux, it should change anything because the file system encoding is the locale encoding. Said differently, Python3 does already use the locale encoding for the command arguments on Linux. If you pass a filename on the command line and then open it: the filename is decoded with the locale encoding, and then encoded with the file system encoding. I fear that it will fail if both encodings are differents...
msg106150 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-20 13:02
Fix the title: sys.argv is already decoded using the locale encoding on Unix, the problem is that it uses a (possibly) different encoding to encode command line arguments: file system encoding.
msg106171 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-05-20 17:23
> I think that we should use the locale encoding to encode and decode command line arguments. I disagree. IIUC, this is only about OSX. Now, we shouldn't take any action until either some OSX expert explains us how command line arguments are being passed on OSX, or we find some Apple documentation that can be taken as a specification. I think the C locale is very poorly supported on OSX, and we shouldn't really use it for anything. What may be useful is the terminal encoding (which may be different both from UTF-8 and the locale encoding), however, it's not possible to find out what the terminal encoding is. In addition, programs may be started "directly" (i.e. not from the terminal), in which case the terminal encoding would be irrelevant. For file name arguments at least, it's very clear that the command line arguments also use the file system encoding.
msg106543 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-26 17:01
@loewis: You restored the original (wrong) title "Use locale encoding to decode sys.argv, not the file system encoding", instead of the new (good) title "Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)". Is it wanted or not?
msg108151 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-18 23:42
Attached patch is a draft adding a new encoding: command line encoding. It is used to encode (subprocess) and decode (python) the command line arguments. It adds sys.getcmdlineencoding().
msg108153 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-06-18 23:54
I'm still -1, failing to see the problem that is solved.
msg108154 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-18 23:55
> I'm still -1, failing to see the problem that is solved. I know (and I agree), but I don't want to loose the patch :-)
msg111432 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-07-24 09:14
This issue only seems to be relevant for OSX, and then only for OSX releases before 10.5, because in that release Apple made sure that the LANG variable and simular LC_* ones specify a UTF-8 encoding and we're back at the common case where the filesystem encoding matches the locale encoding. A system where the filesystem encoding doesn't match the locale encoding is hard to get right. While it would be possible to add sys.cmdlineencoding that doesn't actually solve the semantic problem because external tools might not cooperate. That is, most system tools seem to work with bytes internally and do not treat arguments as text encoded in the locale encoding that should be re-encoded in the filesystem encoding before passing them to the C APIs. That is, when calling "ls somefile" the "ls" command will pass the bytes in argv[1] to the POSIX routines for getting file information without trying to reencode. In short, having a filesystem encoding that is different from the command-line only works when all system tools cooperate and are unicode aware. To be honest, I'd say the behavior of OSX 10.4 is a bug and we might add a workaround on that platform that uses CFStringGetSystemEncoding() to fetch the actual system encoding when LANG=C. (And I'm -1 on adding the patch) See also:
msg111456 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-07-24 11:26
It seems that everybody now agrees to close this issue as "won't fix".
History
Date User Action Args
2022-04-11 14:57:01 admin set github: 53021
2010-07-24 11:26:52 loewis set status: open -> closedresolution: wont fixmessages: +
2010-07-24 09:14:40 ronaldoussoren set nosy: + ronaldoussorenmessages: +
2010-07-07 02:02:28 piro set nosy: + piro
2010-06-18 23:55:55 vstinner set messages: +
2010-06-18 23:54:13 loewis set messages: +
2010-06-18 23:53:35 loewis set title: Use locale encoding to decode sys.argv, not the file system encoding -> Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)
2010-06-18 23:42:45 vstinner set files: + cmdline_encoding.patchkeywords: + patchmessages: +
2010-05-26 17:01:45 vstinner set messages: +
2010-05-20 17:23:04 loewis set nosy: + loewistitle: Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.) -> Use locale encoding to decode sys.argv, not the file system encodingmessages: +
2010-05-20 16:29:04 Arfrever set nosy: + Arfrever
2010-05-20 13:02:03 vstinner set messages: + title: Use locale encoding to decode sys.argv, not the file system encoding -> Use locale encoding to encode command line arguments (subprocess, os.exec*(), etc.)
2010-05-20 12:09:24 vstinner create