[Python-Dev] Optionally using GMP to implement long if available (original) (raw)
Tim Peters tim.peters at gmail.com
Tue Nov 4 19:33:00 CET 2008
- Previous message: [Python-Dev] Optionally using GMP to implement long if available
- Next message: [Python-Dev] Optionally using GMP to implement long if available
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hey, Mark -- let's establish some background here first. It's a fact that the product of two N-bit integers can require 2N bits, and also a fact that lots of HW supports NxN->2N bit integer multiplication directly.
However, it's unfortunately also a fact that standard C has no corresponding concept: "" in C always means NxN->N multiplication (single-width product, with possibility of overflow). I don't know whether C99 improved this situation -- I know it was proposed to add some "double-width integer product" /functions/, but I don't know whether that was adopted. I do know that "" remained "single-width".
[Tim Peters]
Note that while 32x32->64 multiply is supported by x86 HW, that doesn't mean there's a uniform way to get at this HW capability across C compilers. In contrast, (at least) efficient HW 15x15->30 multiply is universally spelled in C via "i*j" :-)
['Mark Dickinson]
If we're talking about standards and portability, doesn't "i*j" fail on those (nonexistent?) platforms where the 'int' type is only 16-bits? Shouldn't this be something like "(long)i * j" instead?
Sorry, I should have made type declarations explicit. In Python, the basic long building block is "digit", which is typedef'ed to C unsigned short. C89 guarantees this holds at least 16 bits. Whenever two digits are multiplied, the code intends to cast (at least) one of them to "twodigits" first (if you ever see a spot where this doesn't happen, it's a bug). "twodigits" is typedef'ed to C long. C89 guarantees that a long holds at least 32 bits.
So C guarantees that we're doing (at least) 32x32->32 multiplication whenever you see code like
digit i, j;
twodigits k;
k = (twodigits)i * j;
In particular, the (at least) 32x32->32 C89 guarantees for that is at least 15x15->30, and the latter is all that longobject.c intends to rely on. Along with the cast to twodigits, this is achieved across all conforming C compilers simply by using infix "*". The code from 1990 still works fine, on everything from cell phones to archaic Cray boxes.
And for 32x32 -> 64, can't this simply be replaced by "(uint64t)i * j", where uint64t is as in C99? I'd hope that most compilers would manage to turn this into the appropriate 32x32-bit hardware multiply.
That's C99, not C89, and therefore less portable.
On platforms that support it, this is at least 64x64->64 multiplication, potentially much more expensive than the 32x32->64 (or 31x31->62?) flavor you /intend/ to move to.
There's no way to know exactly what compilers will do with this short of staring at generated code.
FYI, in a previous life working in speech recognition, under Microsoft's Visual Studio 6 the only way we found to get at the Pentium's 32x32->64 HW ability efficiently was via using inline assembler. For example, using various MSVC spellings of "64-bit int" instead for the inputs usually generated external calls to a long-winded C library "piece by piece" multiplication routine (which, at the time, emulated 64x64->128 multiplication, then threw away the high 64 bits).
Again, the fundamental problem here is the disconnect between what some HW is capable of and what C allows to express (at least through C89). That's why it's impossible here to write portable code in C that's also efficient. Even what Python does now is vulnerable on the efficiency count: on some weird platforms, "long" is 64 bits, and so multiplying a pair of twodigits incurs the expense of (usually non-native) 64x64->64 multiplication.
I agree that very-long-integer optimizations probably don't really belong in Python,
Depends in part on whether Python can attract as many obsessed maintainers and porters for such gonzo algorithms as GMP attracts ;-)
but this patch should also provide significant benefits for short and medium-sized integers. I guess I need to go away and do some benchmarking...
If you can /get at/ HW 32x32->64 int multiply, of course that would be faster.
- Previous message: [Python-Dev] Optionally using GMP to implement long if available
- Next message: [Python-Dev] Optionally using GMP to implement long if available
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]