Issue 9580: os.confstr() doesn't decode result according to PEP 383 (original) (raw)

Created on 2010-08-12 19:16 by baikie, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
confstr-pep383.diff baikie,2010-08-12 19:16 Decode confstr() result according to PEP 383
confstr-bytes-3.2.diff baikie,2010-08-19 18:48 Make os.confstr() return a bytes object, attributing the change to Python 3.2
Messages (7)
msg113700 - (view) Author: David Watson (baikie) Date: 2010-08-12 19:16
The attached patch applies on top of the patch from issue #9579 to make it use PyUnicode_DecodeFSDefaultAndSize(). (You could use it in the existing code, but until that issue is fixed, there is sometimes nothing to decode!)
msg113723 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-12 23:52
Can you give me examples of configuration keys with undecodable values? PyUnicode_DecodeFSDefault(AndSize) encoding depends on the locale whereas PyUnicode_FromString uses utf-8. I don't know the encoding of confstr() values. You can decode an utf-8 value using surrogateescape (PEP 383 error handler) with PyUnicode_DecodeUTF8(value, strlen(value), "surrogateescape").
msg113807 - (view) Author: David Watson (baikie) Date: 2010-08-13 18:36
The CS_PATH variable is a colon-separated list of directories ("the value for the PATH environment variable that finds all standard utilities"), so the file system encoding is certainly correct there. I don't see any reference to an encoding in the POSIX spec for confstr().
msg113846 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 22:31
Le vendredi 13 août 2010 20:36:22, vous avez écrit : > The CS_PATH variable is a colon-separated list of directories ("the value > for the PATH environment variable that finds all standard utilities"), so > the file system encoding is certainly correct there. CS_PATH is hardcoded to "/bin:/usr/bin" in the GNU libc for UNIX. Do you know another key for which the value can be controled by the user (or the system administrator)? > I don't see any reference to an encoding in the POSIX spec for confstr(). CS_PATH is just an example, there are other keys. I'm not sure that all values are encoded to the filesystem encodings, it might be another encoding? Well, if we really doesn't know the encoding, a solution is to use a bytes API (which may avoid the question of the usage of the PEP 383).
msg113923 - (view) Author: David Watson (baikie) Date: 2010-08-14 19:17
> CS_PATH is hardcoded to "/bin:/usr/bin" in the GNU libc for UNIX. Do you know > another key for which the value can be controled by the user (or the system > administrator)? No, not a specific example, but CS_PATH could conceivably refer to some POSIX compatibility suite that's been installed in a non-ASCII location, and implementations can add their own variables for whatever they want. > CS_PATH is just an example, there are other keys. I'm not sure that all values > are encoded to the filesystem encodings, it might be another encoding? > > Well, if we really doesn't know the encoding, a solution is to use a bytes API > (which may avoid the question of the usage of the PEP 383). The other variables defined by POSIX refer to environment variables and command-line options for the C compiler and the getconf utility, all of which would use the FS encoding in Python, but I agree there's no way to know the appropriate encoding in general, or even whether anything cares about encodings. Personally, I have no objections to making it return bytes.
msg114402 - (view) Author: David Watson (baikie) Date: 2010-08-19 18:48
I wrote this patch to make confstr() return bytes (with code similar to 2.x), and document the change in "Porting to Python 3.2" and elsewhere, but it then occurred to me that you might have been talking about making a separate bytes API like os.environb. Which did you have in mind? There is another option for a str API, which is to decode the value as ASCII with the surrogateescape error handler. The returned string will then round-trip correctly through PyUnicode_FSConverter(), etc., as long as the file system encoding is compatible with ASCII, which PEP 383 requires it to be. This is how undecodable command line arguments are currently handled when mbrtowc() is unavailable.
msg116063 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-10 23:51
Fixed in r84696+r84697: confstr-minimal.diff from #9579 + PyUnicode_DecodeFSDefaultAndSize(). Thanks for the patch, sorry for the delay.
History
Date User Action Args
2022-04-11 14:57:05 admin set github: 53789
2010-09-10 23:51:51 vstinner set resolution: duplicate -> fixed
2010-09-10 23:51:42 vstinner set status: open -> closedresolution: duplicatemessages: +
2010-08-19 18:48:18 baikie set files: + confstr-bytes-3.2.diffmessages: +
2010-08-14 19:17:19 baikie set messages: +
2010-08-13 22:32:00 vstinner set messages: +
2010-08-13 18:36:20 baikie set messages: +
2010-08-12 23:52:38 vstinner set messages: +
2010-08-12 20:38:38 pitrou set nosy: + vstinner
2010-08-12 19:16:53 baikie create