[Python-Dev] Optionally using GMP to implement long if available (original) (raw)

Mark Dickinson dickinsm at gmail.com
Tue Nov 4 21:59:11 CET 2008


On Tue, Nov 4, 2008 at 6:33 PM, Tim Peters <tim.peters at gmail.com> wrote:

Whenever two digits are multiplied, the code intends to cast (at least) one of them to "twodigits" first (if you ever see a spot where this doesn't happen, it's a bug). "twodigits" is typedef'ed to C long. C89 guarantees that a long holds at least 32 bits.

There are indeed one or two spots that seem to be missing a cast, for example the line "rem -= hi*n" in inplace_divrem1. I've fixed all those I found, in the issue 4258 patch.

And for 32x32 -> 64, can't this simply be replaced by "(uint64t)i * j", 1. That's C99, not C89, and therefore less portable.

Understood; my thought was to use uint32_t and uint64_t for digit and twodigits when available (with longs being stored in base 230), falling back to the current ushort/ulong/base 215 otherwise. On Unix, autoconf makes this easy by providing macros like 'AC_TYPE_INT32_T', which makes sure that int32_t is defined to be an integer type of exact width 32, when available.

2. On platforms that support it, this is at least 64x64->64 multiplication, potentially much more expensive than the 32x32->64 (or 31x31->62?) flavor you /intend/ to move to.

3. There's no way to know exactly what compilers will do with this short of staring at generated code.

Yes---I've done the staring for gcc, so I now have precisely one data point, which is that various flavours of gcc on x86/x86_64 are clever enough to turn

(uint64_t)my_uint32 * my_other_uint32

into the appropriate HW instruction. Unfortunately I don't have easy access to other compilers or platforms right now. :-( Am still working on the benchmarking, but I'm definitely seeing speedup on gcc/x86---between 10% and 100% depending on the operations.

FYI, in a previous life working in speech recognition, under Microsoft's Visual Studio 6 the only way we found to get at the Pentium's 32x32->64 HW ability efficiently was via using inline assembler.

Urk. We definitely don't want to go there. Though I guess this is how packages like gmp and GP/Pari manage.

C89). That's why it's impossible here to write portable code in C that's also efficient. Even what Python does now is vulnerable on the

But maybe it's possible to write portable code (by providing fallbacks) that turns out to be efficient on the majority of mainstream systems? The extent of the ifdef'ery in the patch is really rather small: one (compound) #ifdef in longintrepr.h for defining digit, twodigits, stwodigits etc, and a couple more for the places where digits are read and written in marshal.c.

I agree that very-long-integer optimizations probably don't really belong in Python, Depends in part on whether Python can attract as many obsessed maintainers and porters for such gonzo algorithms as GMP attracts ;-)

Well, you can count me in. :)

Mark



More information about the Python-Dev mailing list