msg113699 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-12 19:12 |
It may be hard to find a configuration string this long, but you can see the problem if you apply the attached confstr-reduce-bufsize.diff to reduce the size of the local array buffer that posix_confstr() uses. With it applied: >>> import os >>> print(ascii(os.confstr("CS_PATH"))) '\x00\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb\ucbcb' The problem arises because the code first tries to receive the configuration string into the local buffer (char buffer[256], reduced to char buffer[1] above), but then tries to receive it directly into a string object if it doesn't fit. You can see what's gone wrong by comparing the working code in 2.x: if ((unsigned int)len >= sizeof(buffer)) { result = PyString_FromStringAndSize(NULL, len-1); if (result != NULL) confstr(name, PyString_AS_STRING(result), len); } else result = PyString_FromStringAndSize(buffer, len-1); with the code in 3.x: if ((unsigned int)len >= sizeof(buffer)) { result = PyUnicode_FromStringAndSize(NULL, len-1); if (result != NULL) confstr(name, _PyUnicode_AsString(result), len); } else result = PyUnicode_FromStringAndSize(buffer, len-1); Namely, that in 3.x it tries to receive the string into the bytes object returned by _PyUnicode_AsString(), not the str object it has just allocated (which has the wrong format anyway - Py_UNICODE as opposed to char). The attached confstr-long-result.diff fixes this by allocating a separate buffer when necessary to receive the result, before creating the string object from it. By putting the confstr() call and allocation in a loop, it also handles the possibility that the value's length might change between calls. |
|
|
msg113701 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-12 19:18 |
The returned string should also be decoded with the file system encoding and surrogateescape error handler, as per PEP 383 - there's a patch at issue #9580 to do this. |
|
|
msg113724 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 00:02 |
About confstr-long-result.diff: why do you use a loop to reallocate the buffer? confstr() result may change? |
|
|
msg113808 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-13 18:41 |
I don't see why confstr() values shouldn't change. sysconf() values can change between calls, IIRC. Implementations can also define their own confstr variables - they don't have to stick to the POSIX stuff. And using a loop means the confstr() call only appears once in the source, which is more elegant, right? :) |
|
|
msg113840 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-08-13 22:07 |
I just fear that the loop is "endless". Imagine the worst case: confstr() returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long. I would prefer to see a condition to stop after 2 steps. It should maybe stop when an error at the 3rd step. |
|
|
msg113924 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-14 19:18 |
> I just fear that the loop is "endless". Imagine the worst case: confstr() returns a counter (n, n+1, n+2, ...). In 64 bits, it can be long. The returned length is supposed to be determined by the length of the variable, not the length of the buffer passed by the caller, so I don't see why the OS would have a bug like that, and it would probably be exposed by the test suite anyway (there's currently a simple test using CS_PATH). > I would prefer to see a condition to stop after 2 steps. It should maybe stop when an error at the 3rd step. That is, raise an exception? Yeah, possibly, but I think it's better to just believe what the OS tells you rather than have an exception that's only raised once in a blue moon for something that may just be a low-probability event, and not an error. |
|
|
msg114400 - (view) |
Author: David Watson (baikie) |
Date: 2010-08-19 18:47 |
I've opened a separate issue for the changing-length problem (issue #9647; it affects 2.x as well). Here is a patch that fixes the 255-byte issue only, and has similar results to the 2.x code if the value changes length between calls (except that it could raise a UnicodeError if the string is truncated inside a multibyte character encoding). |
|
|
msg116062 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-09-10 23:50 |
Fixed in r84696+r84697: confstr-minimal.diff + PyUnicode_DecodeFSDefaultAndSize(). |
|
|