Issue 29619: st_ino (unsigned long long) is casted to long long in posixmodule.c:_pystat_fromstructstat (original) (raw)

Created on 2017-02-22 12:24 by mba, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 557 merged vstinner,2017-03-08 08:04
PR 584 merged vstinner,2017-03-09 16:41
PR 1666 merged xdegaye,2017-05-19 15:16
PR 3102 merged vstinner,2017-08-16 09:12
Messages (22)
msg288355 - (view) Author: Mba (mba) Date: 2017-02-22 12:24
ino_t is of type 'unsigned long' or 'unsigned long long'. Casting it to signed type means that negative values are incorrectly returned for large inodes.
msg289213 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-03-08 07:36
Any reason our _Py_stat_struct on Windows uses a signed type to represent st_ino?
msg289216 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-08 08:07
On Windows, _Py_attribute_data_to_stat() converts BY_HANDLE_FILE_INFORMATION to _Py_stat_struct, and then _pystat_fromstructstat() creates Python objects. The file index in BY_HANDLE_FILE_INFORMATION is made of two DWORD, so yes, it's unsigned. On Linux, stat.st_ino type is ino_t which is unsigned too. So I created a pull request to fix the bug, even if I don't think that a filesystem produce inodes larger than 2^63-1. Not sure if it's worth it to backport the fix to Python 2.7, 3.5 and 3.6?
msg289221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-08 09:24
I think it is worth to backport this at least to 3.6.
msg289288 - (view) Author: Mba (mba) Date: 2017-03-09 12:52
> I don't think that a filesystem produce inodes larger than 2^63-1 The problem is real: the large inode number was assigned to a file created by MacOS on a share exported via CIFS, and then stated by Linux using NFS export.
msg290247 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-24 22:37
New changeset 68d29809405dc766966b2b973b8597212fbc3dbd by Victor Stinner in branch '3.6': bpo-29619: Convert st_ino using unsigned integer (#557) (#584) https://github.com/python/cpython/commit/68d29809405dc766966b2b973b8597212fbc3dbd
msg290250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-24 22:38
New changeset 0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c by Victor Stinner in branch 'master': bpo-29619: Convert st_ino using unsigned integer (#557) https://github.com/python/cpython/commit/0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c
msg292141 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-04-22 22:26
The compilation of Modules/posixmodule.c fails now on Android architecture 'x86' at api level 21, after the above change 0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c: /opt/android-ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --sysroot=/opt/android-ndk/platforms/android-21/arch-x86 -target i686-none-linux-androideabi -gcc-toolchain /opt/android-ndk/toolchains/x86-4.9/prebuilt/linux-x86_64 -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wno-unused-value -Wno-empty-body -Qunused-arguments -Wno-parentheses-equality -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Werror=implicit-function-declaration -IObjects -IInclude -IPython -I. -I./Include -I/opt/android-ndk/platforms/android-21/arch-x86/usr/include -DPy_BUILD_CORE -c ./Modules/posixmodule.c -o Modules/posixmodule.o ./Modules/posixmodule.c:1935:5: error: array size is negative Py_BUILD_ASSERT(sizeof(unsigned long) >= sizeof(st->st_ino)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./Include/pymacro.h:43:15: note: expanded from macro 'Py_BUILD_ASSERT' (void)Py_BUILD_ASSERT_EXPR(cond); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~ ./Include/pymacro.h:40:19: note: expanded from macro 'Py_BUILD_ASSERT_EXPR' (sizeof(char [1 - 2*!(cond)]) - 1) ^~~~~~~~~~~~~ 1 error generated. make: *** [Makefile:1720: Modules/posixmodule.o] Error 1 The following code: #include <stdio.h> #include <sys/stat.h> int main() { struct stat *st; printf("sizeof(unsigned long): %d - sizeof(st->st_ino): %d\n", sizeof(unsigned long), sizeof(st->st_ino)); return 0; } prints the following result on this same system: sizeof(unsigned long): 4 - sizeof(st->st_ino): 8
msg292142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-22 22:48
Would it work with unsigned long long? What is the value of HAVE_LARGEFILE_SUPPORT? Maybe the code should be simplified to always use unsigned long long.
msg292174 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-04-23 17:18
sizeof(unsigned long long) is 8 on Android x86 and HAVE_LARGEFILE_SUPPORT is undefined. According to HAVE_LARGEFILE_SUPPORT is also undefined on Android x86_64 (and also on Linux x86_64). > Maybe the code should be simplified to always use unsigned long long. Yes. If I understand correcty, one could only use 'unsigned long long' to assert at build time and use the smallest of the two unsigned long that fits with the size of st->st_ino at runtime.
msg293859 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-17 16:11
In configure.ac, HAVE_LARGEFILE_SUPPORT is defined if (off_t > long && longlong >= off_t) and thus, when it is not defined, it means that off_t <= long (since longlong < off_t is false) so an off_t should fit into a long. But on Android the size of off_t is different from the size of ino_t so one cannot use the value of HAVE_LARGEFILE_SUPPORT to know if an ino_t would fit into an unsigned long. On Android architecture 'x86' at api level 21: sizeof(st->st_ino) = 8 and from the output of configure: checking size of off_t... 4 checking size of long... 4 checking size of long long... 8
msg293873 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-17 18:48
Before 0f6d73343d342c106cda2219ebb8a6f0c4bd9b3c, the code used PyLong_FromLong((long)st->st_ino), so the change doesn't introduce a regression but detected a real bug, right? > On Android architecture 'x86' at api level 21: > sizeof(st->st_ino) = 8 > and from the output of configure: > checking size of off_t... 4 > checking size of long... 4 > checking size of long long... 8 What is the type of st_ino? It's not off_t? A workaround would be to force the usage unsigned long long on Android, right? Can you please try the following patch on Android, and propose a PR if it works? diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c index e8c15a9..78b3cb9 100644 --- a/Modules/posixmodule.c +++ b/Modules/posixmodule.c @@ -1927,7 +1927,7 @@ _pystat_fromstructstat(STRUCT_STAT *st) return NULL; PyStructSequence_SET_ITEM(v, 0, PyLong_FromLong((long)st->st_mode)); -#if defined(HAVE_LARGEFILE_SUPPORT) | defined(MS_WINDOWS) +#if defined(HAVE_LARGEFILE_SUPPORT)
msg293883 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-17 19:55
> What is the type of st_ino? It's not off_t? The type of st_ino is unsigned long long as defined in sys/stat.h for x86 and off_t is defined in sys/types.h this way;, quoting: /* This historical accident means that we had a 32-bit off_t on 32-bit architectures. */ typedef __kernel_off_t off_t; There is also a related section in the bionic github repo [1] quoted here: ========================= This probably belongs in the NDK documentation rather than here, but these are the known ABI bugs in the 32-bit ABI: time_t is 32-bit. http://b/5819737. In the 64-bit ABI, time_t is 64-bit. off_t is 32-bit. There is off64_t, and in newer releases there is almost-complete support for _FILE_OFFSET_BITS. Unfortunately our stdio implementation uses 32-bit offsets and -- worse -- function pointers to functions that use 32-bit offsets, so there's no good way to implement the last few pieces http://b/24807045. In the 64-bit ABI, off_t is off64_t. sigset_t is too small on ARM and x86 (but correct on MIPS), so support for real-time signals is broken. http://b/5828899 In the 64-bit ABI, sigset_t is the correct size for every architecture. ========================= I will test your patch on x86 and x86_64 on Android and propose a patch if it works. [1] https://android.googlesource.com/platform/bionic.git/#32_bit-ABI-bugs
msg293919 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-18 11:39
With the patch proposed by Victor in the cross-compilation of posixmodule.c succeeds on android-21-x86 and android-21-x86_64. Following my suggestion in , I propose the following patch instead: diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c index e8c15a9473..d7ff3c78bd 100644 --- a/Modules/posixmodule.c +++ b/Modules/posixmodule.c @@ -1927,14 +1927,12 @@ _pystat_fromstructstat(STRUCT_STAT *st) return NULL; PyStructSequence_SET_ITEM(v, 0, PyLong_FromLong((long)st->st_mode)); -#if defined(HAVE_LARGEFILE_SUPPORT) | defined(MS_WINDOWS) Py_BUILD_ASSERT(sizeof(unsigned long long) >= sizeof(st->st_ino)); - PyStructSequence_SET_ITEM(v, 1, + if (sizeof(unsigned long) >= sizeof(st->st_ino)) + PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLong(st->st_ino)); + else + PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLongLong(st->st_ino)); -#else - Py_BUILD_ASSERT(sizeof(unsigned long) >= sizeof(st->st_ino)); - PyStructSequence_SET_ITEM(v, 1, PyLong_FromUnsignedLong(st->st_ino)); -#endif #ifdef MS_WINDOWS PyStructSequence_SET_ITEM(v, 2, PyLong_FromUnsignedLong(st->st_dev)); #else This patch removes the conditional on HAVE_LARGEFILE_SUPPORT (explicit is better than implicit) and fixes the problem for any operating system where the size of off_t and the size of st_ino are different. The compiler removes the dead branch of the <if (sizeof(unsigned long)...> conditional, so actually it compiles to the same binary as the other patch (gcc does this without any optimization configured on the command line, I didn't check this for clang). The same kind of fix could also be applied elsewhere in posixmodule.c where behind the use of HAVE_LARGEFILE_SUPPORT there is an incorrect assumption that the size of off_t is the same as the size of another type: * In _pystat_fromstructstat() - a little bit further down this patch - where the other type is st_size. * In os_DirEntry_inode_impl() where the other type is ino_t. Maybe this should be done in a new issue then ?
msg293920 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-18 11:45
> The same kind of fix could also be applied elsewhere in posixmodule.c By also adding a Py_BUILD_ASSERT() statement which is a welcome improvement on the previous code that allowed to pinpoint the problem on Android.
msg293924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-18 14:51
Please propose the change as a PR.
msg293959 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-19 15:24
PR 1666 added. With Unified Headers introduced in android-ndk-r14, the size of off_t for 32 bits Android devices is 8 instead of 4 and the cross-compilation of Python on Android does not fail in this case, see .
msg293987 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-05-20 00:09
What is the advantage of compiling calls to both Long and LongLong converters? Isn’t it simpler to blindly call PyLong_FromUnsignedLongLong or similar?
msg294030 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-20 14:12
Good point. PyLong_FromLong() is faster than PyLong_FromLongLong(), IMO the fastest should be used when it is possible. PyLong_FromUnsignedLong() and PyLong_FromUnsignedLongLong() are almost identical, they implement the same algorithm. The first one uses unsigned long (on the stack and for a register) while the other uses unsigned long long, so the difference in performance may be in the order of nano seconds on current processors. Should only PyLong_FromUnsignedLongLong() be used then or should this be left as is for consistency with the handling of PyLong_FromLong() ? Another point (not related) is that the d_ino member of the DirEntry structure is of type ino_t which is unsigned and PR 1666 should be corrected to use unsigned instead in the last modifications made by the PR in os_DirEntry_inode_impl(). I will push this change when a decision has been taken on the point raised by Martin.
msg294034 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-20 15:23
I'm fine with always using unsigned long long. I don't think that it's even possible to measure the perf difference and if it's doable, I expect don't expect it to be significant... especially compared to the cost of the stat () syscall! Well, most computers are 64-bit. Even smartphones, no?
msg294135 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-05-22 09:15
New changeset 50e86033de85294d87b7e942701d456342abde8e by xdegaye in branch 'master': bpo-29619: Do not use HAVE_LARGEFILE_SUPPORT for type conversions (GH-1666). https://github.com/python/cpython/commit/50e86033de85294d87b7e942701d456342abde8e
msg300412 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-17 12:33
New changeset ffbb6f7334ccf54f09dcc9e760766d861928f13e by Victor Stinner in branch '3.6': bpo-29619: Do not use HAVE_LARGEFILE_SUPPORT for type conversions (GH-1666) (#3102) https://github.com/python/cpython/commit/ffbb6f7334ccf54f09dcc9e760766d861928f13e
History
Date User Action Args
2022-04-11 14:58:43 admin set github: 73805
2017-08-17 12:33:09 vstinner set messages: +
2017-08-16 09:12:25 vstinner set pull_requests: + <pull%5Frequest3141>
2017-05-22 09:17:51 xdegaye set status: open -> closedresolution: fixed
2017-05-22 09:15:10 xdegaye set messages: +
2017-05-20 15:23:49 vstinner set messages: +
2017-05-20 14:12:02 xdegaye set messages: +
2017-05-20 00:09:46 martin.panter set nosy: + martin.pantermessages: +
2017-05-19 15:24:14 xdegaye set messages: +
2017-05-19 15:16:25 xdegaye set pull_requests: + <pull%5Frequest1761>
2017-05-18 14:51:24 vstinner set messages: +
2017-05-18 11:45:11 xdegaye set messages: +
2017-05-18 11:39:51 xdegaye set messages: +
2017-05-17 19:55:53 xdegaye set messages: +
2017-05-17 18:48:50 vstinner set status: closed -> openresolution: fixed -> (no value)messages: +
2017-05-17 16:11:47 xdegaye set messages: +
2017-04-23 17🔞28 xdegaye set messages: +
2017-04-22 22:48:43 vstinner set messages: +
2017-04-22 22:26:39 xdegaye set nosy: + xdegayemessages: +
2017-03-24 22:58:04 xiang.zhang set status: open -> closedstage: resolvedresolution: fixedversions: - Python 2.7, Python 3.5
2017-03-24 22:38:32 vstinner set messages: +
2017-03-24 22:37:58 vstinner set messages: +
2017-03-18 06:32:45 serhiy.storchaka set pull_requests: - <pull%5Frequest599>
2017-03-17 21:00:33 larry set pull_requests: + <pull%5Frequest599>
2017-03-09 16:41:36 vstinner set pull_requests: + <pull%5Frequest481>
2017-03-09 12:52:08 mba set messages: +
2017-03-08 09:24:49 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2017-03-08 08:08:00 vstinner set nosy: + vstinnermessages: +
2017-03-08 08:04:54 vstinner set pull_requests: + <pull%5Frequest458>
2017-03-08 07:36:06 xiang.zhang set nosy: + eryksun, steve.dowermessages: +
2017-02-22 12:55:43 xiang.zhang set nosy: + xiang.zhangversions: - Python 3.3, Python 3.4
2017-02-22 12:39:48 mba set components: - Library (Lib)
2017-02-22 12:24:02 mba create