(original) (raw)

uint64\_t find\_mod\_mul\_inverse(uint64\_t x, uint64\_t bits)
{
 if (bits > 64 || ((x&1)==0))
 return 0;// invalid parameters
 uint64\_t mask;
 if (bits == 64)
 mask = -1;
 else
 { 
 mask = 1;
 mask<<=bits;
 mask--;
 }
 x&=mask;
 uint64\_t result=1, state=x, ctz=0;
 while(state!=1ULL)
 {
 ctz=\_\_builtin\_ctzll(state^1);
 result|=1ULL<
 state+=x<
 state&=mask;
 }
 return result;
}

now consider the following steps:
from the 2 constants (d and r) we create 3 constants (with the same bit length):
constants uint32 s,u,mmi;
mmi = find\_mod\_mul\_inverse(d,32);
s = (r\*mmi);
u = (UINT32\_MAX-r)/d; // UINT32\_MAX corresponds to pow(2,32)-1.
the idea behind these constants is the following formula:
mmi\_of(d)\*x=x/d+(x%d)\*mmi\_of(d)

now after we generated the constants, we will just emit the following code instead of the former:
bool check\_remainder(uint32 x)
{
return ((x\*mmi)-s)<=u;
}

Anyway, I looked at the file IntegerDivision.cpp, it seems to me that this new optimization is more effective then the optimization used there. However, I have no experience with compiler development, so I can just give you my idea. if further explanation is needed, just ask. I tested my method and it gives the correct results.