Issue 10044: small int optimization (original) (raw)

Created on 2010-10-07 11:56 by pitrou, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (38)

msg118103 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 11:56

This is an experimental patch to optimize some operations on small ints. pystone is 5-10% faster, pybench 2-3% faster, and here are some relevant benchmarks from unladen swallow:

nbody

Min: 0.345136 -> 0.317502: 1.09x faster Avg: 0.346827 -> 0.319561: 1.09x faster Significant (t=79.50) Stddev: 0.00140 -> 0.00198: 1.4084x larger

nqueens

Min: 0.339744 -> 0.313506: 1.08x faster Avg: 0.342630 -> 0.315380: 1.09x faster Significant (t=73.41) Stddev: 0.00218 -> 0.00146: 1.4931x smaller

If the principle gets accepted, we could experiment with further optimizations such as dedicated opcodes for addition-with-int-constant, subscripting-with-int-constant, etc.

msg118109 - (view)

Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)

Date: 2010-10-07 13:34

Arithmetic with void* pointers is not allowed by the Microsoft compilers. char* should be used instead.

msg118111 - (view)

Author: Daniel Stutzbach (stutzbach) (Python committer)

Date: 2010-10-07 14:43

How does performance change if you adjust NSMALLPOSINTS and NSMALLNEGINTS, but make no other changes?

msg118113 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 15:49

Looks like a nice idea, at first glance, though I haven't looked at the code in detail. I like the use of the macros to abstract away the long implementation details.

"INPLACE_SUBTRACT_end", not "INPLACE_SUBSTRACT_end", please!

msg118114 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 15:53

+#define _PyLong_IS_SMALL_INT(v) \

+/* These macros purposedly avoid a cast to int, since it is most of time

Urk! This is nasty. :(

I don't think arbitrary comparisons of pointers give well-defined results, unless those pointers both happen to point into the same array. (Might be wrong; I don't have a copy of the C standard to hand.)

msg118118 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 16:13

Maybe we could consider adding an extra field to a PyLong giving its 'small_int' value for small values, and some flag value for non-small longs. An extra field wouldn't actually enlarge the size of a PyLong for small values---on a 64-bit machine, a value like 23L takes 28 bytes, for which 32 bytes will actually be allocated (since Python always allocates in multiples of 8 bytes, I believe).

msg118120 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 16:23

Maybe we could consider adding an extra field to a PyLong giving its 'small_int' value for small values, and some flag value for non-small longs. An extra field wouldn't actually enlarge the size of a PyLong for small values---on a 64-bit machine, a value like 23L takes 28 bytes, for which 32 bytes will actually be allocated (since Python always allocates in multiples of 8 bytes, I believe).

I actually had a patch for that. It declared a ob_digit[2] array instead of ob_digit[1], and ob_digit[1] contained the small int. But the patch still used the pointer comparison approach for PyLong_IS_SMALL_INT, because I think it's faster (just two comparisons). So there didn't seem to much point.

Also, the pointer addition trick for addition (see BINARY_ADD) is probably faster than the more intuitive method.

msg118121 - (view)

Author: Daniel Stutzbach (stutzbach) (Python committer)

Date: 2010-10-07 16:27

I don't think arbitrary comparisons of pointers give well-defined results, unless those pointers both happen to point into the same array. (Might be wrong; I don't have a copy of the C standard to hand.)

Technically arbitrary relational comparisons of pointers are undefined, but in practice Antoine's assumptions here are very modest. They boil down to:

v >= &array[0] && v < &array[array_len]

It is hard for me to imagine a system designed such that the expression could evaluate to true when v is not in the array.

I suppose a system could be designed where relational comparisons of unrelated data pointers causes a segmentation fault or similar, but that also seems unlikely to me.

msg118127 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 19:30

Technically arbitrary relational comparisons of pointers are undefined, but in practice Antoine's assumptions here are very modest. They boil down to:

v >= &array[0] && v < &array[array_len]

I can't say anything about the standard, but p > q looks like it should be the same as (p - q) > 0, which looks rather well-defined for pointers.

(at worse we could cast to _Py_uintptr_t, hopefully the compiler wouldn't produce any different code)

msg118131 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 20:29

How does performance change if you adjust NSMALLPOSINTS and NSMALLNEGINTS, but make no other changes?

It makes a very small difference (which is understandable since it doesn't cut down on code execution a lot).

msg118132 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 21:00

Technically arbitrary relational comparisons of pointers are undefined, but in practice Antoine's assumptions here are very modest.

I disagree: there's a very real practical danger here. Namely, optimizing compilers are free to assume that code doesn't involve any undefined behaviour and optimize accordingly. gcc for one is known to make extensive use of this freedom. I wouldn't be at all surprised to find some current or future version of gcc optimizing an (p >= start_of_array) check to 'always true', on the basis that it will always be true in legal code.

Please don't introduce undefined behaviour here---it's asking for trouble.

msg118133 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 21:04

I can't say anything about the standard, but p > q looks like it should be the same as (p - q) > 0

Yep.

which looks rather well-defined for pointers.

Nope. It's only well-defined for pointers pointing into the same array (or to one past the end of an array). Otherwise it's undefined behaviour.

See section 6.5.6, paragraph 9, of

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

msg118135 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 21:12

Nope. It's only well-defined for pointers pointing into the same array (or to one past the end of an array). Otherwise it's undefined behaviour.

How can the compiler tell whether two pointers are "into the same array"? That sounds like an undecidable criterion.

msg118136 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 21:14

How can the compiler tell whether two pointers are "into the same array"? That sounds like an undecidable criterion.

It doesn't have to be able to tell---it's allowed to assume. :-)

msg118137 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 21:15

How can the compiler tell whether two pointers are "into the same array"? That sounds like an undecidable criterion.

It doesn't have to be able to tell---it's allowed to assume. :-)

That doesn't very clear or understandable.

msg118138 - (view)

Author: Eric V. Smith (eric.smith) * (Python committer)

Date: 2010-10-07 21:18

In the bad old days of 386 segment:offset memory architectures this was a problem. You could have overlapping segments but pointers inside an object were always in the same segment, so the segment selectors never had to be inspected. Pointers to different objects could indeed have the same offset and would compare equal.

We should follow the standard here: no comparisons between pointers to different arrays (basically).

msg118139 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 21:22

See the example above: suppose that a compiler is looking at a (p >= q) comparison of pointers. Suppose furthermore that in a particular case that compiler is smart enough to figure out that q is a pointer to the start of an array. Then the compiler is permitted to assume that p also points into the same array, since if it didn't then the code would introduce undefined behaviour. And since q is the start of the array, and p is (by assumption) a pointer into the same array, p >= q will automatically be true, so the compiler is free to replace the expression with the integer '1' (i.e., true).

gcc does similar things with checks like (x + 1 > x): if x is a (signed) int, then gcc can and will optimize (x + 1 > x) to 'true', on the basis that x + 1 can never overflow, because such overflow would be undefined behaviour and therefore can't occur in a valid C program.

msg118140 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 21:29

In the bad old days of 386 segment:offset memory architectures this was a problem. You could have overlapping segments but pointers inside an object were always in the same segment, so the segment selectors never had to be inspected. Pointers to different objects could indeed have the same offset and would compare equal.

AFAIK it caused lots of other problems much more annoying than just pointer compares, and programmers had to juggle by hand with various addressing modes (near, far, etc.).

ISTM that all C programs nowadays assume some kind of flat, paged memory model, CPython included; I don't think we have to fear some i386-like segmentation model.

We should follow the standard here: no comparisons between pointers to different arrays (basically).

Again, what is an "array" supposed to be? In memory there are no "arrays".

Let's say I have the following function in a module:

int ptr_compare(char *a, char *b) { return a > b; }

and another module, the following function

int foo() { static char x[5]; static char y[6]; return ptr_compare(x + 2, y + 3); }

How is the compiler supposed to know whether a and b belong to the same array when compiling ptr_compare?

Otherwise, we could cast to Py_uintptr_t.

msg118142 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 21:36

How is the compiler supposed to know whether a and b belong to the same array when compiling ptr_compare?

It doesn't need to know. So long as the compiler can guarantee that its code will produce correct results in the case that a and b do both point to the same array, that's enough. In other words, when producing code for ptr_compare, the compiler is allowed to assume that a and b point into the same array, and act accordingly.

msg118143 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 21:38

See the example above: suppose that a compiler is looking at a (p >= q) comparison of pointers. Suppose furthermore that in a particular case that compiler is smart enough to figure out that q is a pointer to the start of an array.

Which array? You can have arrays everywhere in memory, at any address, ending anywhere.

union { struct { char ch1; char arr1[2]; } struct { char arr2[2]; char ch2; } struct { char arr3[3]; } }

Which is an array, and which is not? is &ch1 an array? and arr1? and arr2+1? Why would the compiler choose one over another? And what about arrays defined in other modules?

msg118144 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 21:40

In other words, when producing code for ptr_compare, the compiler is allowed to assume that a and b point into the same array, and act accordingly.

But this assumption doesn't bring anything, right? That is, there is no shortcut way to compute "p - q" or "p > q" when both point into the same "array".

msg118145 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-10-07 21:47

For the record, a Py_uintptr_t version works and has the same performance. Would you agree to it or is there still some menacing oddity from the i386 days lurking around?

msg118146 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2010-10-07 22:03

For the record, a Py_uintptr_t version works and has the same performance. Would you agree to it or is there still some menacing oddity from the i386 days lurking around?

Technically, it's still dodgy: as the gcc manual notes in:

http://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html#Arrays-and-pointers-implementation

". That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 6.5.6/8."

I can't see as much scope for problems with the uintptr_t version. But just because I can't anticipate the problems, it doesn't mean they don't exist.

It really would be better to avoid the undefined behaviour if at all possible.

msg124955 - (view)

Author: Xuanji Li (xuanji) *

Date: 2010-12-31 07:00

fwiw I've always found this helpful for undefined behavior: http://blog.regehr.org/archives/213 and, just as it says "x+1 > x" will be optimized to a nop, by the same logic "v >= &array[0] && v < &array[array_len]" will also be optimized to a nop.

msg125006 - (view)

Author: Meador Inge (meador.inge) * (Python committer)

Date: 2011-01-01 20:24

How is the compiler supposed to know whether a and b belong to the same array when compiling ptr_compare?

I agree with Mark, it doesn't need to know. However, many compilers [1,2] support whole program optimization and could in theory figure the address out using that technique.

[1] GCC -flto - http://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/Optimize-Options.html#Optimize-Options [2] VC++ LTCG - http://msdn.microsoft.com/en-us/library/xbf3tbeh.aspx

msg135024 - (view)

Author: Steffen Daode Nurpmeso (sdaoden)

Date: 2011-05-03 10:45

Now me.

(http://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html#Arrays-and-pointers-implementation)

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined. That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 6.5.6/8.

Say - isn't that a joke about farts of armchair crouchers. All it says is that if you dereference garbage you get a crash. If you're concerned, use volatile, and go shoot the compiler programmer if she dares to optimize just anything. And on ARM, isn't the interrupt table at ((void*)(char[])0x0)?? La vie est une rose beside that. And all i need is atomic_compare_and_swap().

msg170940 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2012-09-22 00:41

If I understood correctly, the optimization proposed by Antoine was somehow rejected because _PyLong_IS_SMALL_INT() may be optimized "incorrectly".

If a compiler "miscompiles" this macro, can't we disable this optimization on this specific compiler, instead of missing an interesting optimization on all compilers? I bet that no compiler will do insane optimization on such test.

Where in CPython source code do we use directly references to small_ints? The common case is to write PyLong_FromLong(0). Can compiler call PyLong_FromLong() to detect that the result is part of the small_ints array? Or that it is not part of small_ints?

The corner case is very unlikely to me.

--

I tested Antoine's patch with GCC 4.6 and CFLAGS="-O3 -flto" (-flto: standard link-time optimizer): the test suite pass. I don't see any magic optimization here, sorry.

--

"fwiw I've always found this helpful for undefined behavior: http://blog.regehr.org/archives/213 and, just as it says "x+1 > x" will be optimized to a nop, by the same logic "v >= &array[0] && v < &array[array_len]" will also be optimized to a nop."

"x+1 > x" and "v >= &array[0]" are not the same checks. In the first test, x is used in both parts. I don't understand.

msg170972 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2012-09-22 08:34

Victor: so you want to deliberately introduce undefined behaviour for the sake of a minor optimization, while crossing your fingers and hoping that that doesn't cause issues with any existing or future compilers?

msg170977 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2012-09-22 09:16

Victor, the danger of this approach is that we can't verify that a particular compiler with certain options on a specific platform compiles the source code with undefined behavior in a certain way. If such a test were possible, if this aspect was guaranteed by the compiler manufacturer, or could be checked in the ./configure script, then such an approach would have some meaning.

Regardless of _PyLong_IS_SMALL_INT, the rest of patch looks a little cumbersome.

msg170986 - (view)

Author: Martin v. Löwis (loewis) * (Python committer)

Date: 2012-09-22 11:11

"x+1 > x" and "v >= &array[0]" are not the same checks. In the first test, x is used in both parts. I don't understand.

The consequences of "undefined behavior" are really difficult to understand, it seems. A compiler which evaluates both relational expressions unconditionally to true would be conforming to the C language definition. The reason for that has been said several times before, but let me repeat them.

Mathematically, x+1 > x for any real number, so it seems natural to assume that this can be determined to be "true" statically. However, what if x+1 overflows? Shouldn't then it evaluate to "false"? Not necessarily if x is signed. Then the behavior of overflow is undefined, and any result of the comparison would be conforming to the standard: the overflow may trap, the expression may evaluate to "false", or the expression may evaluate to "true", or your hard disk may get erased - all of these would be conforming to the C language definition.

For v >= &array[0], this is the same issue. If v points into the array, the expression is true. If v doesn't point into the array, the behavior is undefined, so the compiler may chose to give false, or to give true, or to actually compare the pointers as integers, or to erase your hard disk.

Of these possible behaviors, only two are plausible. For x+1>x, it may be that the compiler actually generates code to perform the addition and the comparison, or it may statically decide that the expression is always true. For v > &array[0], the compiler may either generate code to perform the comparison, or statically decide that the expression is true. Either implementation would be correct - it's not that the compiler is optimizing "incorrectly".

In the specific case, a compiler might infer that _PyLong_IS_SMALL_INT is always true, which would give incorrect results in Python. However, it is the patch which is incorrect, not the compiler.

I recommend to close the issue as rejected.

msg170998 - (view)

Author: Larry Hastings (larry) * (Python committer)

Date: 2012-09-22 14:11

I must be missing something--because I thought Python already depended on this apparently-undefined behavior. The small-block object allocator in Objects/obmalloc.c determines whether a pointer belongs to a particular arena using exactly this trick. I quote from the gigantic comment at the top of that file:

Let B be the arena base address associated with the pool,
B = arenas[(POOL)->arenaindex].address.  Then P belongs to
the arena if and only if

    B <= P < B + ARENA_SIZE

Subtracting B throughout, this is true iff

    0 <= P-B < ARENA_SIZE

This test is implemented as the following macro:

#define Py_ADDRESS_IN_RANGE(P, POOL)                    \
    ((arenaindex_temp = (POOL)->arenaindex) < maxarenas && \
 (uptr)(P) - arenas[arenaindex_temp].address < (uptr)ARENA_SIZE && \
 arenas[arenaindex_temp].address != 0)

Why is that permissible but _PyLong_IS_SMALL_INT is not?

msg171004 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2012-09-22 16:00

Why is that permissible but _PyLong_IS_SMALL_INT is not?

Touchér!

msg171005 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2012-09-22 16:30

Ah: nice catch, Larry.

I would say that the obmalloc case shouldn't be permissible; however, it's already there, and changing that would be an involved task that would also likely risk introducing new bugs. So I guess practicality beats purity on that one. I don't see that as an excuse for introducing new undefined behaviour though, especially for a small optimization.

msg171007 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2012-09-22 16:51

I recommend to close the issue as rejected.

I think _PyLong_IS_SMALL_INT can be rewritten in a safe style. For example, using a checking of several fields ((sdigit)(x)->ob_digit[0] < _MAX_SMALL_INT && PySIZE(x) <= 1) or a special flag. It is possible however that shuch checking will fully destroy the effect of optimization. We need further research.

msg171014 - (view)

Author: Martin v. Löwis (loewis) * (Python committer)

Date: 2012-09-22 22:42

Why is that permissible but _PyLong_IS_SMALL_INT is not?

It's permissable primarily because it is there. There are many places in Python which invoke undefined behavior already (most notably wrt. signed integers); we should strive to eliminate them.

OTOH, obmalloc clearly makes a lot of assumptions on the hardware architecture (including the assumption that there is page-based virtual memory, etc). It's a memory allocator, that gives permission to make such assumptions. If it turns out that they break on some system, obmalloc cannot be used there - hence usage of obmalloc is still a compile-time option.

In addition, for the specific macros: it's easy to see that a compiler may optimize _PyLong_IS_SMALL_INT as true by simple static analysis. For Py_ADDRESS_IN_RANGE, the same static analysis is not possible, since the memory doesn't come from a declared array. It would require whole-program analysis to determine that .address always points to a memory block with ARENA_SIZE bytes. If a compiler manages to figure it out, obmalloc cannot be used on that system (and if that happens, I'll recommend to drop or revise obmalloc altogether, perhaps in favor of a system pool allocator).

msg171016 - (view)

Author: Martin v. Löwis (loewis) * (Python committer)

Date: 2012-09-22 22:44

Zitat von Serhiy Storchaka <report@bugs.python.org>:

I recommend to close the issue as rejected.

I think _PyLong_IS_SMALL_INT can be rewritten in a safe style. For
example, using a checking of several fields
((sdigit)(x)->ob_digit[0] < _MAX_SMALL_INT && PySIZE(x) <= 1) or a special flag. It is possible however that shuch checking will fully
destroy the effect of optimization. We need further research.

Do we need to keep this issue open while this research is being carried out? This issue is already cluttered with the undefined-behavior discussion.

msg185626 - (view)

Author: Mark Dickinson (mark.dickinson) * (Python committer)

Date: 2013-03-31 12:49

Do we need to keep this issue open while this research is being carried out?

I'd say not.

msg259705 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2016-02-06 00:59

Issue #21955 is a spin-off of this issue.

History

Date

User

Action

Args

2022-04-11 14:57:07

admin

set

github: 54253

2016-02-06 00:59:30

vstinner

set

messages: +

2013-03-31 12:49:48

mark.dickinson

set

status: open -> closed
resolution: rejected
messages: +

2012-09-22 22:44:44

loewis

set

messages: +

2012-09-22 22:42:55

loewis

set

messages: +

2012-09-22 16:51:10

serhiy.storchaka

set

messages: +

2012-09-22 16:30:32

mark.dickinson

set

messages: +

2012-09-22 16:00:27

serhiy.storchaka

set

messages: +

2012-09-22 14:11:25

larry

set

nosy: + larry
messages: +

2012-09-22 11:11:11

loewis

set

messages: +

2012-09-22 09:16:42

serhiy.storchaka

set

messages: +

2012-09-22 08:34:12

mark.dickinson

set

messages: +

2012-09-22 00:42:09

vstinner

set

nosy: + loewis

2012-09-22 00:41:59

vstinner

set

messages: +

2012-09-17 16:13:21

serhiy.storchaka

set

nosy: + serhiy.storchaka

components: + Interpreter Core
versions: + Python 3.4, - Python 3.3

2012-09-17 15:23:07

vstinner

set

nosy: + vstinner

2011-07-24 01:16:38

asvetlov

set

nosy: + asvetlov

2011-05-03 10:45:52

sdaoden

set

nosy: + sdaoden
messages: +

2011-05-02 19:57:18

pitrou

set

versions: + Python 3.3, - Python 3.2

2011-01-01 20:24:38

meador.inge

set

nosy: + meador.inge
messages: +

2010-12-31 07:00:36

xuanji

set

nosy: + xuanji
messages: +

2010-10-07 22:03:46

mark.dickinson

set

messages: +

2010-10-07 21:47:50

pitrou

set

messages: +

2010-10-07 21:40:56

pitrou

set

messages: +

2010-10-07 21:38:43

pitrou

set

messages: +

2010-10-07 21:36:27

mark.dickinson

set

messages: +

2010-10-07 21:29:37

pitrou

set

messages: +

2010-10-07 21:22:02

mark.dickinson

set

messages: +

2010-10-07 21🔞56

eric.smith

set

messages: +

2010-10-07 21:15:30

pitrou

set

messages: +

2010-10-07 21:14:36

mark.dickinson

set

messages: +

2010-10-07 21:12:24

pitrou

set

messages: +

2010-10-07 21:04:01

mark.dickinson

set

messages: +

2010-10-07 21:00:36

mark.dickinson

set

messages: +

2010-10-07 20:29:20

pitrou

set

messages: +

2010-10-07 19:30:54

pitrou

set

messages: +

2010-10-07 19:23:31

Aahz

set

nosy: - aahz, Aahz

2010-10-07 17:12:32

eric.smith

set

nosy: + eric.smith

2010-10-07 16:27:17

stutzbach

set

messages: +

2010-10-07 16:23:46

pitrou

set

messages: +

2010-10-07 16:13:51

mark.dickinson

set

messages: +

2010-10-07 15:53:47

mark.dickinson

set

messages: +

2010-10-07 15:49:06

mark.dickinson

set

messages: +

2010-10-07 14:43:59

stutzbach

set

messages: +

2010-10-07 13:34:24

amaury.forgeotdarc

set

nosy: + amaury.forgeotdarc
messages: +

2010-10-07 11:56:27

pitrou

create