msg163473 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-06-22 20:02 |
In unicodeobject.c and stringlib aligned addresses and sizes are used for optimization. pointer->integer and implicit integer->integer conversions may overflow or underflow on platforms with sizeof(size_t) != sizeof(void *) or sizeof(size_t) != sizeof(int). The proposed patch fixes such unsafe things in unicodeobject.c, stringlib and some other files. There are still a few unsafe places in libffi, but in this library Py_uintptr_t nor uintptr_t are not available. |
|
|
msg164799 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-07-07 10:01 |
If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations. E.g., from ascii_decode: unsigned long value = *(const unsigned long *) _p; (here _p has type const char *). This should really be fixed; compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur. |
|
|
msg164800 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-07-07 10:08 |
> This should really be fixed; compilers are known to make optimizations > based on the assumption that this sort of undefined behaviour doesn't > occur. Doesn't the compiler have all the necessary information here? It's not like a subroutine is called. (also, I'm not sure what optimization it could actually make) |
|
|
msg164801 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-07-07 10:11 |
> (also, I'm not sure what optimization it could actually make) Me neither, but it doesn't seem safe to assume that no compiler will take advantage of this. I don't want to start guessing what compilers might or might not do; it would be much better simply to stick to valid C where possible. |
|
|
msg164804 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-07-07 10:14 |
N.B. This could probably be fixed without affecting performance by using the usual union trick. (IIUC, that trick was technically still undefined behaviour for a while, but was eventually made legal by C99 + TC3.) As far as I know there aren't any instances of compilers causing problems with that construct. |
|
|
msg164805 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-07-07 10:16 |
> N.B. This could probably be fixed without affecting performance by > using the usual union trick. How would it work? We would have to add various unions to the PyUnicode_Object definition? |
|
|
msg164809 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-07-07 10:24 |
> How would it work? We would have to add various unions to the > PyUnicode_Object definition? No, you'd just need a temporary union defined in unicodeobject.c that would look something like: typedef union { unsigned long v; char s[SIZEOF_LONG]; } U; (with better choices of names). Python/dtoa.c does a similar thing to read / write the pieces of a C double using integers safely. |
|
|
msg164810 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-07-07 10:26 |
I'll see if I can come up with a patch, and open a new issue for it (since I've successfully derailed this issue from its original topic) |
|
|
msg164818 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-07-07 10:49 |
> If we're worrying about undefined behaviour, it looks like recent optimizations have *introduced* new undefined behaviour in the form of strict aliasing violations. E.g., from ascii_decode: > > unsigned long value = *(const unsigned long *) _p; > > (here _p has type const char *). I don't see what the undefined behavior. Can you specify exactly the item of the Standard, in which it is mentioned? > This should really be fixed; compilers are known to make optimizations based on the assumption that this sort of undefined behaviour doesn't occur. I don't know how else you can rewrite it, without destroying completely the effect of optimization. In any case, I don't think that the original patch introduces some new undefined behavior. |
|
|
msg164821 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-07-07 11:09 |
> I don't see what the undefined behavior. Can you specify exactly the > item of the Standard, in which it is mentioned? It's C99 section 6.5, paragraph 7: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types ...". It's the dereferencing of the pointer that's the problem: that's accessing a stored value of type 'char' by an lvalue expression that has type 'unsigned long'. > In any case, I don't think that the original patch introduces some new > undefined behavior. Ah; were the strict aliasing problems already there before the patch? I didn't check. |
|
|
msg167480 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-08-05 10:39 |
> Ah; were the strict aliasing problems already there before the patch? I didn't check. Please open separate issue for this. |
|
|
msg170568 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-09-16 18:47 |
Please, Mark or some other C expert, can you do a code review for the patch? |
|
|
msg170620 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-09-17 19:09 |
Perhaps the three new macros should be made available in a .h file? (in which case they should be private, i.e. start with _). |
|
|
msg170623 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-09-17 20:09 |
> Perhaps the three new macros should be made available in a .h file? > (in which case they should be private, i.e. start with _). Perhaps. There are similar macros in other files (Include/objimpl.h, Objects/obmalloc.c, Python/pyarena.c, Modules/expat/xmlparse.c). |
|
|
msg170684 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2012-09-18 22:27 |
> Perhaps the three new macros should be made available in a .h file? Good idea. Maybe pymacros.h? These macros need to be documented (with a comment in the .h file) |
|
|
msg170809 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-09-20 12:35 |
Well, here is a new patch. The five new macros moved to pymacros.h and used in more files. |
|
|
msg170811 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-09-20 12:52 |
Apologies; I got distracted from the main point of this issue with the strict aliasing stuff, and then it fell off the to-do list. Unassigning; Antoine or Victor, do you want to take this? |
|
|
msg170817 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-09-20 15:14 |
Mark, please open a new discussion, so we don't lose it and that was the place for discussion. |
|
|
msg170828 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-09-20 18:51 |
Thanks for the patch! These macros will be useful. |
|
|
msg170830 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-09-20 18:57 |
New changeset 99112b851b25 by Antoine Pitrou in branch 'default': Issue #15144: Fix possible integer overflow when handling pointers as integer values, by using Py_uintptr_t instead of size_t. http://hg.python.org/cpython/rev/99112b851b25 |
|
|
msg170831 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-09-20 18:58 |
Committed in 3.3(.1). |
|
|
msg170842 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2012-09-20 20:17 |
> Mark, please open a new discussion. Done: issue 15992. |
|
|