Issue 15144: Possible integer overflow in operations with addresses and sizes. (original) (raw)

Created on 2012-06-22 20:02 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
align_operations.patch serhiy.storchaka,2012-06-22 20:02 review
align_operations2.patch serhiy.storchaka,2012-09-20 12:35 review
Messages (22)
msg163473 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-22 20:02
In unicodeobject.c and stringlib aligned addresses and sizes are used for optimization. pointer->integer and implicit integer->integer conversions may overflow or underflow on platforms with sizeof(size_t) != sizeof(void *) or sizeof(size_t) != sizeof(int). The proposed patch fixes such unsafe things in unicodeobject.c, stringlib and some other files. There are still a few unsafe places in libffi, but in this library Py_uintptr_t nor uintptr_t are not available.
msg164799 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:01
If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations. E.g., from ascii_decode: unsigned long value = *(const unsigned long *) _p; (here _p has type const char *). This should really be fixed; compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur.
msg164800 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-07 10:08
> This should really be fixed; compilers are known to make optimizations > based on the assumption that this sort of undefined behaviour doesn't > occur. Doesn't the compiler have all the necessary information here? It's not like a subroutine is called. (also, I'm not sure what optimization it could actually make)
msg164801 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:11
> (also, I'm not sure what optimization it could actually make) Me neither, but it doesn't seem safe to assume that no compiler will take advantage of this. I don't want to start guessing what compilers might or might not do; it would be much better simply to stick to valid C where possible.
msg164804 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:14
N.B. This could probably be fixed without affecting performance by using the usual union trick. (IIUC, that trick was technically still undefined behaviour for a while, but was eventually made legal by C99 + TC3.) As far as I know there aren't any instances of compilers causing problems with that construct.
msg164805 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-07-07 10:16
> N.B. This could probably be fixed without affecting performance by > using the usual union trick. How would it work? We would have to add various unions to the PyUnicode_Object definition?
msg164809 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:24
> How would it work? We would have to add various unions to the > PyUnicode_Object definition? No, you'd just need a temporary union defined in unicodeobject.c that would look something like: typedef union { unsigned long v; char s[SIZEOF_LONG]; } U; (with better choices of names). Python/dtoa.c does a similar thing to read / write the pieces of a C double using integers safely.
msg164810 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 10:26
I'll see if I can come up with a patch, and open a new issue for it (since I've successfully derailed this issue from its original topic)
msg164818 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-07-07 10:49
> If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations. E.g., from ascii_decode: > > unsigned long value = *(const unsigned long *) _p; > > (here _p has type const char *). I don't see what the undefined behavior. Can you specify exactly the item of the Standard, in which it is mentioned? > This should really be fixed; compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur. I don't know how else you can rewrite it, without destroying completely the effect of optimization. In any case, I don't think that the original patch introduces some new undefined behavior.
msg164821 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-07-07 11:09
> I don't see what the undefined behavior. Can you specify exactly the > item of the Standard, in which it is mentioned? It's C99 section 6.5, paragraph 7: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types ...". It's the dereferencing of the pointer that's the problem: that's accessing a stored value of type 'char' by an lvalue expression that has type 'unsigned long'. > In any case, I don't think that the original patch introduces some new > undefined behavior. Ah; were the strict aliasing problems already there before the patch? I didn't check.
msg167480 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-08-05 10:39
> Ah; were the strict aliasing problems already there before the patch? I didn't check. Please open separate issue for this.
msg170568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-16 18:47
Please, Mark or some other C expert, can you do a code review for the patch?
msg170620 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-17 19:09
Perhaps the three new macros should be made available in a .h file? (in which case they should be private, i.e. start with _).
msg170623 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-17 20:09
> Perhaps the three new macros should be made available in a .h file? > (in which case they should be private, i.e. start with _). Perhaps. There are similar macros in other files (Include/objimpl.h, Objects/obmalloc.c, Python/pyarena.c, Modules/expat/xmlparse.c).
msg170684 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-09-18 22:27
> Perhaps the three new macros should be made available in a .h file? Good idea. Maybe pymacros.h? These macros need to be documented (with a comment in the .h file)
msg170809 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-20 12:35
Well, here is a new patch. The five new macros moved to pymacros.h and used in more files.
msg170811 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-09-20 12:52
Apologies; I got distracted from the main point of this issue with the strict aliasing stuff, and then it fell off the to-do list. Unassigning; Antoine or Victor, do you want to take this?
msg170817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-09-20 15:14
Mark, please open a new discussion, so we don't lose it and that was the place for discussion.
msg170828 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-20 18:51
Thanks for the patch! These macros will be useful.
msg170830 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-09-20 18:57
New changeset 99112b851b25 by Antoine Pitrou in branch 'default': Issue #15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t. http://hg.python.org/cpython/rev/99112b851b25
msg170831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-20 18:58
Committed in 3.3(.1).
msg170842 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-09-20 20:17
> Mark, please open a new discussion. Done: issue 15992.
History
Date User Action Args
2022-04-11 14:57:31 admin set github: 59349
2012-09-28 11:45:51 jcea set nosy: + jcea
2012-09-20 20:17:26 mark.dickinson set messages: +
2012-09-20 18:58:50 pitrou set status: open -> closedversions: + Python 3.3, - Python 3.4messages: + resolution: fixedstage: patch review -> resolved
2012-09-20 18:57:45 python-dev set nosy: + python-devmessages: +
2012-09-20 18:51:51 pitrou set messages: +
2012-09-20 15:14:30 serhiy.storchaka set messages: +
2012-09-20 12:52:12 mark.dickinson set assignee: mark.dickinson -> messages: +
2012-09-20 12:35:14 serhiy.storchaka set files: + align_operations2.patchmessages: +
2012-09-18 22:27:41 vstinner set nosy: + vstinnermessages: +
2012-09-17 20:09:53 serhiy.storchaka set messages: +
2012-09-17 19:09:26 pitrou set type: security -> behaviormessages: + versions: + Python 3.4, - Python 3.3
2012-09-16 18:47:02 serhiy.storchaka set messages: +
2012-08-05 10:40:15 serhiy.storchaka set keywords: + needs reviewpriority: normal -> low
2012-08-05 10:39:16 serhiy.storchaka set messages: + stage: patch review
2012-07-07 11:09:28 mark.dickinson set messages: +
2012-07-07 10:49:58 serhiy.storchaka set messages: +
2012-07-07 10:26:13 mark.dickinson set messages: +
2012-07-07 10:24:48 mark.dickinson set messages: +
2012-07-07 10:16:10 pitrou set messages: +
2012-07-07 10:14:25 mark.dickinson set messages: +
2012-07-07 10:11:33 mark.dickinson set messages: +
2012-07-07 10:08:07 pitrou set nosy: + pitroumessages: +
2012-07-07 10:01:18 mark.dickinson set messages: +
2012-06-23 12:05:25 mark.dickinson set assignee: mark.dickinson
2012-06-22 20:03:22 mark.dickinson set nosy: + mark.dickinson
2012-06-22 20:02:48 serhiy.storchaka create