Issue 9579: In 3.x, os.confstr() returns garbage if value is longer than 255 bytes (original) (raw)

Created on 2010-08-12 19:12 by baikie, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
confstr-reduce-bufsize.diff baikie,2010-08-12 19:12 Shrink stack buffer to one byte for testing
confstr-long-result.diff baikie,2010-08-12 19:13 Allocate separate buffer to receive result
confstr-minimal.diff baikie,2010-08-19 18:47 Fix 255-byte issue without attempting to handle changing length
Messages (8)
msg113699 - (view) Author: David Watson (baikie) Date: 2010-08-12 19:12
It may be hard to find a configuration string this long, but you can see the problem if you apply the attached confstr-reduce-bufsize.diff to reduce the size of the local array buffer that posix_confstr() uses. With it applied: >>> import os >>> print(ascii(os.confstr("CS_PATH"))) '\x00\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb' The problem arises because the code first tries to receive the configuration string into the local buffer (char buffer[256], reduced to char buffer[1] above), but then tries to receive it directly into a string object if it doesn't fit. You can see what's gone wrong by comparing the working code in 2.x: if ((unsigned int)len >= sizeof(buffer)) { result = PyString_FromStringAndSize(NULL, len-1); if (result != NULL) confstr(name, PyString_AS_STRING(result), len); } else result = PyString_FromStringAndSize(buffer, len-1); with the code in 3.x: if ((unsigned int)len >= sizeof(buffer)) { result = PyUnicode_FromStringAndSize(NULL, len-1); if (result != NULL) confstr(name, _PyUnicode_AsString(result), len); } else result = PyUnicode_FromStringAndSize(buffer, len-1); Namely, that in 3.x it tries to receive the string into the bytes object returned by _PyUnicode_AsString(), not the str object it has just allocated (which has the wrong format anyway - Py_UNICODE as opposed to char). The attached confstr-long-result.diff fixes this by allocating a separate buffer when necessary to receive the result, before creating the string object from it. By putting the confstr() call and allocation in a loop, it also handles the possibility that the value's length might change between calls.
msg113701 - (view) Author: David Watson (baikie) Date: 2010-08-12 19:18
The returned string should also be decoded with the file system encoding and surrogateescape error handler, as per PEP 383 - there's a patch at issue #9580 to do this.
msg113724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 00:02
About confstr-long-result.diff: why do you use a loop to reallocate the buffer? confstr() result may change?
msg113808 - (view) Author: David Watson (baikie) Date: 2010-08-13 18:41
I don't see why confstr() values shouldn't change. sysconf() values can change between calls, IIRC. Implementations can also define their own confstr variables - they don't have to stick to the POSIX stuff. And using a loop means the confstr() call only appears once in the source, which is more elegant, right? :)
msg113840 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-13 22:07
I just fear that the loop is "endless". Imagine the worst case: confstr() returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long. I would prefer to see a condition to stop after 2 steps. It should maybe stop when an error at the 3rd step.
msg113924 - (view) Author: David Watson (baikie) Date: 2010-08-14 19:18
> I just fear that the loop is "endless". Imagine the worst case: confstr() returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long. The returned length is supposed to be determined by the length of the variable, not the length of the buffer passed by the caller, so I don't see why the OS would have a bug like that, and it would probably be exposed by the test suite anyway (there's currently a simple test using CS_PATH). > I would prefer to see a condition to stop after 2 steps. It should maybe stop when an error at the 3rd step. That is, raise an exception? Yeah, possibly, but I think it's better to just believe what the OS tells you rather than have an exception that's only raised once in a blue moon for something that may just be a low-probability event, and not an error.
msg114400 - (view) Author: David Watson (baikie) Date: 2010-08-19 18:47
I've opened a separate issue for the changing-length problem (issue #9647; it affects 2.x as well). Here is a patch that fixes the 255-byte issue only, and has similar results to the 2.x code if the value changes length between calls (except that it could raise a UnicodeError if the string is truncated inside a multibyte character encoding).
msg116062 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-10 23:50
Fixed in r84696+r84697: confstr-minimal.diff + PyUnicode_DecodeFSDefaultAndSize().
History
Date User Action Args
2022-04-11 14:57:05 admin set github: 53788
2010-09-10 23:50:57 vstinner set status: open -> closedresolution: fixed
2010-09-10 23:50:48 vstinner set messages: +
2010-08-19 18:47:08 baikie set files: + confstr-minimal.diffmessages: +
2010-08-14 19🔞25 baikie set messages: +
2010-08-13 22:07:12 vstinner set messages: +
2010-08-13 18:41:23 baikie set messages: +
2010-08-13 00:02:08 vstinner set messages: +
2010-08-12 20:38:34 pitrou set nosy: + vstinner
2010-08-12 19🔞10 baikie set messages: +
2010-08-12 19:13:29 baikie set files: + confstr-long-result.diff
2010-08-12 19:12:12 baikie create