msg113700 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-12 19:16 |
The attached patch applies on top of the patch from issue #9579 to make it use PyUnicode_DecodeFSDefaultAndSize(). (You could use it in the existing code, but until that issue is fixed, there is sometimes nothing to decode!) |
|
|
msg113723 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-12 23:52 |
Can you give me examples of configuration keys with undecodable values? PyUnicode_DecodeFSDefault(AndSize) encoding depends on the locale whereas PyUnicode_FromString uses utf-8. I don't know the encoding of confstr() values. You can decode an utf-8 value using surrogateescape (PEP 383 error handler) with PyUnicode_DecodeUTF8(value, strlen(value), "surrogateescape"). |
|
|
msg113807 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-13 18:36 |
The CS_PATH variable is a colon-separated list of directories ("the value for the PATH environment variable that finds all standard utilities"), so the file system encoding is certainly correct there. I don't see any reference to an encoding in the POSIX spec for confstr(). |
|
|
msg113846 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 22:31 |
Le vendredi 13 août 2010 20:36:22, vous avez écrit : > The CS_PATH variable is a colon-separated list of directories ("the value > for the PATH environment variable that finds all standard utilities"), so > the file system encoding is certainly correct there. CS_PATH is hardcoded to "/bin:/usr/bin" in the GNU libc for UNIX. Do you know another key for which the value can be controled by the user (or the system administrator)? > I don't see any reference to an encoding in the POSIX spec for confstr(). CS_PATH is just an example, there are other keys. I'm not sure that all values are encoded to the filesystem encodings, it might be another encoding? Well, if we really doesn't know the encoding, a solution is to use a bytes API (which may avoid the question of the usage of the PEP 383). |
|
|
msg113923 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-14 19:17 |
> CS_PATH is hardcoded to "/bin:/usr/bin" in the GNU libc for UNIX. Do you know > another key for which the value can be controled by the user (or the system > administrator)? No, not a specific example, but CS_PATH could conceivably refer to some POSIX compatibility suite that's been installed in a non-ASCII location, and implementations can add their own variables for whatever they want. > CS_PATH is just an example, there are other keys. I'm not sure that all values > are encoded to the filesystem encodings, it might be another encoding? > > Well, if we really doesn't know the encoding, a solution is to use a bytes API > (which may avoid the question of the usage of the PEP 383). The other variables defined by POSIX refer to environment variables and command-line options for the C compiler and the getconf utility, all of which would use the FS encoding in Python, but I agree there's no way to know the appropriate encoding in general, or even whether anything cares about encodings. Personally, I have no objections to making it return bytes. |
|
|
msg114402 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-19 18:48 |
I wrote this patch to make confstr() return bytes (with code similar to 2.x), and document the change in "Porting to Python 3.2" and elsewhere, but it then occurred to me that you might have been talking about making a separate bytes API like os.environb. Which did you have in mind? There is another option for a str API, which is to decode the value as ASCII with the surrogateescape error handler. The returned string will then round-trip correctly through PyUnicode_FSConverter(), etc., as long as the file system encoding is compatible with ASCII, which PEP 383 requires it to be. This is how undecodable command line arguments are currently handled when mbrtowc() is unavailable. |
|
|
msg116063 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-09-10 23:51 |
Fixed in r84696+r84697: confstr-minimal.diff from #9579 + PyUnicode_DecodeFSDefaultAndSize(). Thanks for the patch, sorry for the delay. |
|
|