Random Numbers in C (original) (raw)
ACiphers By Ritter Page
Random number generators (_pseudo-_random, of course) in C; mostly fast one-liners for "inline" use (to eliminate call / return overhead). If you just want to use these RNG's, it might be a good idea to start at the last message.
Contents
- 1999-01-12 George Marsaglia: "This posting ends with 17 lines of C code that provide eight different in-line random number generators, six for random 32-bit integers and two for uniform reals in (0,1) and (-1,1)."
- 1999-01-12 Charles Bond: "Do you know if any theoretical work has been done since Knuth's book to justify SWB?"
- 1999-01-12 George Marsaglia: "It hasn't been very many years since I invented the subtract-with-borrow method, and developed theory for establishing the periods."
- 1999-01-12 Charles Bond: "I did not mean to imply that Knuth's subtractive generator was *the same* as your subtract with borrow, only that it was *similar* (high speed, no multiplications)."
- 1999-01-12 Mike Oliver: "Could you give us a pointer to information about why these RNGs [lagged Fibonacci generators using xor] are unsatisfactory and what sort of test they tend to fail?"
- 1999-01-13 Eric Backus: "I have a small problem with the definition of LFIB4 and SWB. In an attempt to make these a single line of C code, they both use "++c" in the same expression as they use "c". A C compiler is free to rearrange the order in which it calculates the intermediate terms of these expressions, so the expressions can produce different results depending on the compiler."
- 1999-01-15 George Marsaglia: "The generators MWC, KISS, LFIB4 and SWB seem to pass all tests. By themselves, CONG and SHR3 do not, but using CONG+SHR3 provides one of the fastest combinations that satisfy the DIEHARD battery of tests."
- 1999-01-15 Dmitri Zaykin: "Shouldn't these be...."
- 1999-01-01 Radford Neal: "This doesn't work either."
- 1999-01-15 Jeff Stout: "The preprocessor is just doing a stupid text substituation, its the C compiler that is ambigous about the interpretation."
- 1999-01-01 Radford Neal: "...although one can indeed create lots of junk using the C pre-processor, one cannot, in general, use it to create in-line procedures."
- 1999-01-16 Dmitri Zaykin: "It is probably true that the posted random number generators are better implemented as regular functions."
- 1999-01-16 Dmitri Zaykin: "These should work...."
- 1999-01-17 Ramin Sina: "Could someone please post a short example code on how to use these in practice."
- 1999-01-18 Ed Hook: "No -- 'SHR3' and 'LFIB4' are OK, but your version of 'SWB' still invokes the dreaded undefined behavior."
- 1999-01-19 Orjan Johansen: "Surely in an expression of the form y=..., the evaluation of ... is guaranteed to happen before the assignment."
- 1999-01-19 Horst Kraemer: "Sorry. No sequence point is needed between reading an writing to an lvalue."
- 1999-01-17 Herman Rubin: "I think it should be clarified, and probably written out in some more detail. But the procedure call overhead would be comparable to the computing cost...."
- 1999-01-17 Duncan Murdoch: "I don't know if that's true or not, but I don't think it is really something to worry about."
- 1999-01-18 Herman Rubin: "It certainly is. The cost of one procedure call could far exceed the cost of a uniform, or even many nonuniform, random variables."
- 1999-01-01 Radford Neal: "Even on a CISC machine like a VAX, I doubt the C procedure call overhead would be more that twice the time for the actual work, and on modern RISC machines the ratio will tend to be less."
- 1999-01-19 Herman Rubin: "The code discussed was not in making a call for an array, but for a single step in the generation procedure."
- 1999-01-19 Duncan Murdoch: "I'm not sure what you have in mind."
- 1999-01-19 Herman Rubin: "Can we afford range checking?" "Basic random tools should be as a bit stream, not as floating numbers, for many reasons."
- 1999-01-20 Duncan Murdoch: "...all languages that I know (including C) provide ways of forcing the order of evaluation: just put the things that need to be done first in an earlier statement."
- 1999-01-17 David W. Noon: "I don't mind translating all this stuff into assembler, but anybody not using an Intel 80486 or better running a 32-bit version of OS/2 will not be able to use my code."
- 1999-01-18 Dmitri Zaykin: "Another option is to re-write macros as C++ member functions."
- 1999-01-18 David W. Noon: "The problem with C++ is that VMT calling mechanisms are almost invariably slower than the calling mechanisms used by C, PL/I, FORTRAN, Pascal, etc."
- 1999-01-18 Dmitri Zaykin: "Well, in this particular case all that does not apply, since there are no virtual functions in the code."
- 1999-01-19 Duncan Murdoch: "...but just as with any other aspect of the program, speed isn't as important as correctness."
- 1999-01-20 Dmitri Zaykin: "To check if C-macros for these random number generators do indeed always produce faster, and maybe 'less bloated' code than inlined C++ member functions, I did a little experiment...."
- 1999-01-21 Dmitri Zaykin:
- 1999-01-21 John E. Davis: "Using your code (t.cc and t.c), my conclusion is the opposite...."
- 1999-01-21 Dmitri Zaykin: "I used egcs-1.1.1 compiler. It is is an improvement over gcc-2.8.1."
- 1999-01-18 Mike McCarty:
- 1999-01-18 Herman Rubin: "Decent random number code cannot be portable."
- 1999-01-19 David W. Noon: "I meant portable across software platforms, not hardware."
- 1999-01-19 Herman Rubin: "It is by no means clear that HLL generated code will do this, as the machine instructions involved are different." "What was meant was the distance to the next one in a stream of random bits. In other words, generate a geometric (.5) random variable using only the number of bits required by information."
- 1999-01-19 David W. Noon: "C++ just isn't ideally suited to the task under current implementations of the language."
- 1999-01-20 Dmitri Zaykin: "Templates can be an alternative to C-macros in terms of genericity, not inlining."
- 1999-01-18 Mike McCarty: "There are no 'idiot-proof languages'."
- 1999-01-18 Herman Rubin: "A language has a built in inefficiency in every situation in which the optimal use of machine instructions cannot be addressed in that language."
- 1999-01-21 Mike McCarty: "The checking I had in mind costs *nothing* during the execution of the program."
- 1999-01-16 Dmitri Zaykin: "I see one more problem with the code."
- 1999-01-19 qscgz@my-dejanews.com: "DIEHARD-tests fail on...."
- 1999-01-20 George Marsaglia: "My offer of RNG's for C was an invitation to dance; I did not expect the Tarantella." "Numerous responses have led to improvements; the result is the listing below...."
Subject: Random numbers in C: Some suggestions. Date: Tue, 12 Jan 1999 09:37:37 -0500 From: George Marsaglia geo@stat.fsu.edu Message-ID: 369B5E30.65A55FD1@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis,sci.crypt,sci.physics Lines: 243
This posting ends with 17 lines of C code that provide eight different in-line random number generators, six for random 32-bit integers and two for uniform reals in (0,1) and (-1,1). Comments are interspersed with that code. Various combinations of the six in-line integer generators may put in C expressions to provide a wide variety of very fast, long period, well-tested RNG's. I invite comments, feedback, verifications and timings.
First, there is narrative giving background for this posting; you may want to skip it.
Narrative:
Having had experience in producing and testing for randomness in computers, I am frequently asked to suggest good random number generators (RNG's), test RNG's, or comment on existing RNG's. Many recent queries have been in two areas: (1) requests for implementations in C and (2) comments on generators with immense periods, particularly the Mersenne Twister.
This posting is mainly for category (1), for which I suggest a set of C implementations of RNG's I have developed. C implementations of my DIEHARD battery of tests will be discussed elsewhere, and Narasimhan's GUI version is expected to be released soon.
For (2), I merely repeat what I have said in response to various queries: the Mersenne Twister looks good, but it seems to be essentially a lagged Fibonacci RNG using the exclusive-or (xor) operation, and experience has shown that lagged Fibonacci generators using xor provide unsatisfactory 'randomness' unless the lags are very long, and even for those with very long lags, (and even for those using + or - rather than xor), many people (I among them) are inclined to be cautious about sequences based on such a simple operation as: each new integer is the xor, (or sum, or difference), of two earlier ones. To be sure, the resulting integers can be "twisted", but not, I think, as simply or as safely as combining, say by addition, with members of a sequence from a (shorter period) generator that has itself passed extensive tests of randomness.
I also reply that it does not take an immense program (as, for example, in posted listings of Twister) to produce a more satisfactory RNG with an immense period, and give this example, on which I will expand below: Inclusion of
#define SWB ( t[c+237]=(x=t[c+15])-(y=t[++c]+(x<y)) )
together with suitably initialized seeds in
static unsigned long x,y,t[256]; unsigned char c;
will allow you to put the string SWB in any C expression and it will provide, in about 100 nanosecs, a 32-bit random integer with period 2^7578. (Here and below, ^ means exponent, except in C expressions, where it means xor (exclusive-or).
Now for the (2) part, in which I suggest a number of C implementations and invite comment and feedback. Most of these were previously developed and tested via Fortran versions. I list eight RNG's, each of them by means of C's powerful #define device. This provides fast, compact implementation, allows insertion of the required random variable directly into an expression, and, finally, provides a good selection of RNG's for use individually or in combination. The latter makes it possible to further confirm what empirical results suggest: combining two or more RNG's provides better, (or no worse) randomness, and for encryption enthusiasts: combination generators are harder to "crack".
For those wishing to try these eight RNG's:
At the top of your C program, include these definitions and the static variables that follow. Everything past this line is either C code or comment.
#define UL unsigned long #define znew ((z=36969*(z&65535)+(z>>16))<<16) #define wnew ((w=18000*(w&65535)+(w>>16))&65535) #define MWC (znew+wnew) #define SHR3 (jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5)) #define CONG (jcong=69069jcong+1234567) #define KISS ((MWC^CONG)+SHR3) #define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[++c+178]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[++c]+(x<y))) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)4.656613e-10 / Global static variables: */ static UL z=362436069, w=521288629, jsr=123456789, jcong=380116160; static UL t[256]; static UL x=0,y=0; static unsigned char c=0;
/* Random seeds must be used to reset z,w,jsr,jcong and the table t[256] Here is an example procedure, using KISS: */
void settable(UL i1,UL i2,UL i3,UL i4) { int i; z=i1;w=i2,jsr=i3; jcong=i4; for(i=0;i<256;i++) t[i]=KISS; }
/* End of C code; Only comments follow. Stick the above 17 lines in your simulation programs, initialize the table, and have a variety of promising RNG's at your disposal. */
/* You may want use more complicated names for the above simple 1-letter variable names: z,w,x,y,t,c, to avoid clashing with favorites in your code. */
/* Any one of KISS, MWC, LFIB4, SWB, SHR3, or CONG can be used in an expression to provide a random 32-bit integer, and UNI in an expression will provide a random uniform in (01), or VNI in (-1,1). For example, for int i, float v; i=(MWC>>24); will provide a random byte, while v=4.+3.*UNI; will provide a uniform v in the interval 4. to 7. For the super cautious, (KISS+SWB) in an expression would provide a random 32-bit integer from a sequence with period > 2^7700, and would only add some 300 nanoseconds to the computing time for that expression. */
/* The KISS generator, (Keep It Simple Stupid), is designed to combine the two multiply-with-carry generators in MWC with the 3-shift register SHR3 and the congruential generator CONG, using addition and exclusive-or. Period about 2^123. It is one of my favorite generators. */
/* The MWC generator concatenates two 16-bit multiply-with-carry generators, x(n)=36969x(n-1)+carry, y(n)=18000y(n-1)+carry mod 2^16, has period about 2^60 and seems to pass all tests of randomness. A favorite stand-alone generator---faster than KISS, which contains it.*/
/* SHR3 is a 3-shift-register generator with period 2^32-1. It uses y(n)=y(n-1)(I+L^17)(I+R^13)(I+L^5), with the y's viewed as binary vectors, L the 32x32 binary matrix that shifts a vector left 1, and R its transpose. SHR3 seems to pass all except the binary rank test, since 32 successive values, as binary vectors, must be linearly independent, while 32 successive truly random 32-bit integers, viewed as binary vectors, will be linearly independent only about 29% of the time. */
/* CONG is a congruential generator with the widely used 69069 as multiplier: x(n)=69069x(n-1)+1234567. It has period 2^32. The leading half of its 32 bits seem to pass all tests, but bits in the last half are too regular. */
/* LFIB4 is an extension of the class that I have previously defined as lagged Fibonacci generators: x(n)=x(n-r) op x(n-s), with the x's in a finite set over which there is a binary operation op, such as +,- on integers mod 2^32,
- on odd such integers, exclusive-or (xor) on binary vectors. Except for those using multiplication, lagged Fibonacci generators fail various tests of randomness, unless the lags are very long. To see if more than two lags would serve to overcome the problems of 2- lag generators using +,- or xor, I have developed the 4-lag generator LFIB4: x(n)=x(n-256)+x(n-179)+x(n-119)+x(n-55) mod 2^32. Its period is 2^31*(2^256-1), about 2^287, and it seems to pass all tests---in particular, those of the kind for which 2-lag generators using +,-,xor seem to fail. For even more confidence in its suitability, LFIB4 can be combined with KISS, with a resulting period of about 2^410: just use (KISS+LFIB4) in any C expression. */
/* SWB is a subtract-with-borrow generator that I developed to give a simple method for producing extremely long periods: x(n)=x(n-222)-x(n-237)-borrow mod 2^32. The 'borrow' is 0 unless set to 1 if computing x(n-1) caused overflow in 32-bit integer arithmetic. This generator has a very long period, 2^7098(2^480-1), about 2^7578. It seems to pass all tests of randomness, but, suspicious of a generator so simple and fast (62 nanosecs at 300MHz), I would suggest combining SWB with KISS, MWC, SHR3, or CONG. */
/* Finally, because many simulations call for uniform random variables in 0<v<1 or -1<v<1, I use #define statements that permit inclusion of such variates directly in expressions: using UNI will provide a uniform random real (float) in (0,1), while VNI will provide one in (-1,1). */
/* All of these: MWC, SHR3, CONG, KISS, LFIB4, SWB, UNI and VNI, permit direct insertion of the desired random quantity into an expression, avoiding the time and space costs of a function call. I call these in-line-define functions. To use them, static variables z,w,jsr and jcong should be assigned seed values other than their initial values. If LFIB4 or SWB are used, the static table t[256] must be initialized. A sample procedure follows. */
/* A note on timing: It is difficult to provide exact time costs for inclusion of one of these in-line-define functions in an expression. Times may differ widely for different compilers, as the C operations may be deeply nested and tricky. I suggest these rough comparisons, based on averaging ten runs of a routine that is essentially a long loop: for(i=1;i<10000000;i++) L=KISS; then with KISS replaced with SHR3, CONG,... or KISS+SWB, etc. The times on my home PC, a Pentium 300MHz, in nanoseconds: LFIB4=64; CONG=90; SWB=100; SHR3=110; KISS=209; KISS+LFIB4=252; KISS+SWB=310. */
Subject: Re: Random numbers in C: Some suggestions. Date: Tue, 12 Jan 1999 07:13:47 -0800 From: Charles Bond cbond@ix.netcom.com Message-ID: 369B66AB.6CDED9F8@ix.netcom.com References: 369B5E30.65A55FD1@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis,sci.crypt,sci.physics Lines: 253
Thanks for the post. I want to comment on the SWB routine. I've been using a similar routine in high speed simulations for years. Small departures from statistically correct randomness are not a problem for my application, but speed is. I believe Knuth briefly discussed the method with guarded approval -- constrained by the concern that there was no real theory behind it. Do you know if any theoretical work has been done since Knuth's book to justify SWB?
George Marsaglia wrote:
This posting ends with 17 lines of C code that provide eight different in-line random number generators, six for random 32-bit integers and two for uniform reals in (0,1) and (-1,1). Comments are interspersed with that code. Various combinations of the six in-line integer generators may put in C expressions to provide a wide variety of very fast, long period, well-tested RNG's. I invite comments, feedback, verifications and timings.
First, there is narrative giving background for this posting; you may want to skip it.
Narrative:
Having had experience in producing and testing for randomness in computers, I am frequently asked to suggest good random number generators (RNG's), test RNG's, or comment on existing RNG's. Many recent queries have been in two areas: (1) requests for implementations in C and (2) comments on generators with immense periods, particularly the Mersenne Twister.
This posting is mainly for category (1), for which I suggest a set of C implementations of RNG's I have developed. C implementations of my DIEHARD battery of tests will be discussed elsewhere, and Narasimhan's GUI version is expected to be released soon.
For (2), I merely repeat what I have said in response to various queries: the Mersenne Twister looks good, but it seems to be essentially a lagged Fibonacci RNG using the exclusive-or (xor) operation, and experience has shown that lagged Fibonacci generators using xor provide unsatisfactory 'randomness' unless the lags are very long, and even for those with very long lags, (and even for those using + or - rather than xor), many people (I among them) are inclined to be cautious about sequences based on such a simple operation as: each new integer is the xor, (or sum, or difference), of two earlier ones. To be sure, the resulting integers can be "twisted", but not, I think, as simply or as safely as combining, say by addition, with members of a sequence from a (shorter period) generator that has itself passed extensive tests of randomness.
I also reply that it does not take an immense program (as, for example, in posted listings of Twister) to produce a more satisfactory RNG with an immense period, and give this example, on which I will expand below: Inclusion of
#define SWB ( t[c+237]=(x=t[c+15])-(y=t[++c]+(x<y)) )
together with suitably initialized seeds in
static unsigned long x,y,t[256]; unsigned char c;
will allow you to put the string SWB in any C expression and it will provide, in about 100 nanosecs, a 32-bit random integer with period 2^7578. (Here and below, ^ means exponent, except in C expressions, where it means xor (exclusive-or).
Now for the (2) part, in which I suggest a number of C implementations and invite comment and feedback. Most of these were previously developed and tested via Fortran versions. I list eight RNG's, each of them by means of C's powerful #define device. This provides fast, compact implementation, allows insertion of the required random variable directly into an expression, and, finally, provides a good selection of RNG's for use individually or in combination. The latter makes it possible to further confirm what empirical results suggest: combining two or more RNG's provides better, (or no worse) randomness, and for encryption enthusiasts: combination generators are harder to "crack".
For those wishing to try these eight RNG's:
At the top of your C program, include these definitions and the static variables that follow. Everything past this line is either C code or comment.
#define UL unsigned long #define znew ((z=36969*(z&65535)+(z>>16))<<16) #define wnew ((w=18000*(w&65535)+(w>>16))&65535) #define MWC (znew+wnew) #define SHR3 (jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5)) #define CONG (jcong=69069jcong+1234567) #define KISS ((MWC^CONG)+SHR3) #define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[++c+178]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[++c]+(x<y))) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)4.656613e-10 / Global static variables: */ static UL z=362436069, w=521288629, jsr=123456789, jcong=380116160; static UL t[256]; static UL x=0,y=0; static unsigned char c=0;
/* Random seeds must be used to reset z,w,jsr,jcong and the table t[256] Here is an example procedure, using KISS: */
void settable(UL i1,UL i2,UL i3,UL i4) { int i; z=i1;w=i2,jsr=i3; jcong=i4; for(i=0;i<256;i++) t[i]=KISS; }
/* End of C code; Only comments follow. Stick the above 17 lines in your simulation programs, initialize the table, and have a variety of promising RNG's at your disposal. */
/* You may want use more complicated names for the above simple 1-letter variable names: z,w,x,y,t,c, to avoid clashing with favorites in your code. */
/* Any one of KISS, MWC, LFIB4, SWB, SHR3, or CONG can be used in an expression to provide a random 32-bit integer, and UNI in an expression will provide a random uniform in (01), or VNI in (-1,1). For example, for int i, float v; i=(MWC>>24); will provide a random byte, while v=4.+3.*UNI; will provide a uniform v in the interval 4. to 7. For the super cautious, (KISS+SWB) in an expression would provide a random 32-bit integer from a sequence with period > 2^7700, and would only add some 300 nanoseconds to the computing time for that expression. */
/* The KISS generator, (Keep It Simple Stupid), is designed to combine the two multiply-with-carry generators in MWC with the 3-shift register SHR3 and the congruential generator CONG, using addition and exclusive-or. Period about 2^123. It is one of my favorite generators. */
/* The MWC generator concatenates two 16-bit multiply-with-carry generators, x(n)=36969x(n-1)+carry, y(n)=18000y(n-1)+carry mod 2^16, has period about 2^60 and seems to pass all tests of randomness. A favorite stand-alone generator---faster than KISS, which contains it.*/
/* SHR3 is a 3-shift-register generator with period 2^32-1. It uses y(n)=y(n-1)(I+L^17)(I+R^13)(I+L^5), with the y's viewed as binary vectors, L the 32x32 binary matrix that shifts a vector left 1, and R its transpose. SHR3 seems to pass all except the binary rank test, since 32 successive values, as binary vectors, must be linearly independent, while 32 successive truly random 32-bit integers, viewed as binary vectors, will be linearly independent only about 29% of the time. */
/* CONG is a congruential generator with the widely used 69069 as multiplier: x(n)=69069x(n-1)+1234567. It has period 2^32. The leading half of its 32 bits seem to pass all tests, but bits in the last half are too regular. */
/* LFIB4 is an extension of the class that I have previously defined as lagged Fibonacci generators: x(n)=x(n-r) op x(n-s), with the x's in a finite set over which there is a binary operation op, such as +,- on integers mod 2^32,
- on odd such integers, exclusive-or (xor) on binary vectors. Except for those using multiplication, lagged Fibonacci generators fail various tests of randomness, unless the lags are very long. To see if more than two lags would serve to overcome the problems of 2- lag generators using +,- or xor, I have developed the 4-lag generator LFIB4: x(n)=x(n-256)+x(n-179)+x(n-119)+x(n-55) mod 2^32. Its period is 2^31*(2^256-1), about 2^287, and it seems to pass all tests---in particular, those of the kind for which 2-lag generators using +,-,xor seem to fail. For even more confidence in its suitability, LFIB4 can be combined with KISS, with a resulting period of about 2^410: just use (KISS+LFIB4) in any C expression. */
/* SWB is a subtract-with-borrow generator that I developed to give a simple method for producing extremely long periods: x(n)=x(n-222)-x(n-237)-borrow mod 2^32. The 'borrow' is 0 unless set to 1 if computing x(n-1) caused overflow in 32-bit integer arithmetic. This generator has a very long period, 2^7098(2^480-1), about 2^7578. It seems to pass all tests of randomness, but, suspicious of a generator so simple and fast (62 nanosecs at 300MHz), I would suggest combining SWB with KISS, MWC, SHR3, or CONG. */
/* Finally, because many simulations call for uniform random variables in 0<v<1 or -1<v<1, I use #define statements that permit inclusion of such variates directly in expressions: using UNI will provide a uniform random real (float) in (0,1), while VNI will provide one in (-1,1). */
/* All of these: MWC, SHR3, CONG, KISS, LFIB4, SWB, UNI and VNI, permit direct insertion of the desired random quantity into an expression, avoiding the time and space costs of a function call. I call these in-line-define functions. To use them, static variables z,w,jsr and jcong should be assigned seed values other than their initial values. If LFIB4 or SWB are used, the static table t[256] must be initialized. A sample procedure follows. */
/* A note on timing: It is difficult to provide exact time costs for inclusion of one of these in-line-define functions in an expression. Times may differ widely for different compilers, as the C operations may be deeply nested and tricky. I suggest these rough comparisons, based on averaging ten runs of a routine that is essentially a long loop: for(i=1;i<10000000;i++) L=KISS; then with KISS replaced with SHR3, CONG,... or KISS+SWB, etc. The times on my home PC, a Pentium 300MHz, in nanoseconds: LFIB4=64; CONG=90; SWB=100; SHR3=110; KISS=209; KISS+LFIB4=252; KISS+SWB=310. */
Subject: Re: Random numbers in C: Some suggestions. Date: Tue, 12 Jan 1999 13:56:41 -0500 From: George Marsaglia geo@stat.fsu.edu Message-ID: 369B9AE9.52C98810@stat.fsu.edu References: 369B66AB.6CDED9F8@ix.netcom.com Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis,sci.crypt,sci.physics Lines: 59
Charles Bond wrote:
Thanks for the post. I want to comment on the SWB routine. I've been using a similar routine in high speed simulations for years. ...
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Many years? It hasn't been very many years since I invented the subtract-with-borrow method, and developed theory for establishing the periods. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
. I believe Knuth briefly discussed the method with guarded approval -- constrained by the concern that there was no real theory behind it. Do you know if any theoretical work has been done since Knuth's book to justify SWB?
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& This business of providing 'theoretical support' for RNG's tends to be overdone---perhaps because of the influence of Knuth. His marvelous expository and poweful mathematical skills have justifiably made him the leading authority. Many people do not understand that such theory is based on the simple fact that congruential random numbers "fall mainly in the planes", that is, points whose coordinates are succesive elements from the sequence lie on a lattice with a huge unit-cell volume, (m^(n-1) in n-dimensions), compared to the unit-cell volume of 1 for truly random integers. So the "lattice test", as I called it, applies to congruential generators, although the ideas have been extended to the near-lattice-like patterns of certain other kinds of generators. But there seem to be no such lattice-like patterns for many other kinds of generators, and even if there were, it is an easy matter to destroy such patterns by combining with generators having disparate mathematical structures.
The quote from my Keynote Address at the 1984 Computer Science and Statistics: Symposium on the Interface, still applies:
"A random number generator is like sex: When it's good, its wonderful; And when it's bad, it's still pretty good."
Add to that, in line with my recommendations on combination generators;
"And if it's bad, try a twosome or threesome."
George Marsaglia
Subject: Re: Random numbers in C: Some suggestions. Date: Tue, 12 Jan 1999 16:09:29 -0800 From: Charles Bond cbond@ix.netcom.com Message-ID: 369BE439.92E0E011@ix.netcom.com References: 369B9AE9.52C98810@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis,sci.crypt,sci.physics Lines: 69
For the record, I did not mean to imply that Knuth's subtractive generator was the same as your subtract with borrow, only that it was similar (high speed, no multiplications). I gladly acknowledge your claim on it. But you seem a little skeptical of it yourself, and I was just curious.
Regards,
C. Bond
George Marsaglia wrote:
Charles Bond wrote:
Thanks for the post. I want to comment on the SWB routine. I've been using a similar routine in high speed simulations for years. ...
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Many years? It hasn't been very many years since I invented the subtract-with-borrow method, and developed theory for establishing the periods. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
. I believe Knuth briefly discussed the method with guarded approval -- constrained by the concern that there was no real theory behind it. Do you know if any theoretical work has been done since Knuth's book to justify SWB?
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& This business of providing 'theoretical support' for RNG's tends to be overdone---perhaps because of the influence of Knuth. His marvelous expository and poweful mathematical skills have justifiably made him the leading authority. Many people do not understand that such theory is based on the simple fact that congruential random numbers "fall mainly in the planes", that is, points whose coordinates are succesive elements from the sequence lie on a lattice with a huge unit-cell volume, (m^(n-1) in n-dimensions), compared to the unit-cell volume of 1 for truly random integers. So the "lattice test", as I called it, applies to congruential generators, although the ideas have been extended to the near-lattice-like patterns of certain other kinds of generators. But there seem to be no such lattice-like patterns for many other kinds of generators, and even if there were, it is an easy matter to destroy such patterns by combining with generators having disparate mathematical structures.
The quote from my Keynote Address at the 1984 Computer Science and Statistics: Symposium on the Interface, still applies:
"A random number generator is like sex: When it's good, its wonderful; And when it's bad, it's still pretty good."
Add to that, in line with my recommendations on combination generators;
"And if it's bad, try a twosome or threesome."
George Marsaglia
Subject: Re: Random numbers in C: Some suggestions. Date: Tue, 12 Jan 1999 15:54:10 -0800 From: Mike Oliver oliver@math.ucla.edu Message-ID: 369BE0A2.CA197F7B@math.ucla.edu References: 369B5E30.65A55FD1@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis,sci.crypt,sci.physics Lines: 17
George Marsaglia wrote:
[...] and experience has shown that lagged Fibonacci generators using xor provide unsatisfactory 'randomness' unless the lags are very long, and even for those with very long lags, (and even for those using + or - rather than xor),
Could you give us a pointer to information about why these RNGs are unsatisfactory and what sort of test they tend to fail?
-- Disclaimer: I could be wrong -- but I'm not. (Eagles, "Victim of Love")
Finger for PGP public key, or visit http://www.math.ucla.edu/~oliver. 1500 bits, fingerprint AE AE 4F F8 EA EA A6 FB E9 36 5F 9E EA D0 F8 B9
Subject: Re: Random numbers in C: Some suggestions. Date: 13 Jan 1999 12:21:37 -0800 From: Eric Backus ericb@labejb.lsid.hp.com Message-ID: tun23mnctq.fsf@labejb.lsid.hp.com References: 369B5E30.65A55FD1@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis,sci.crypt,sci.physics Lines: 65
George Marsaglia geo@stat.fsu.edu writes:
This posting ends with 17 lines of C code that provide eight different in-line random number generators, six for random 32-bit integers and two for uniform reals in (0,1) and (-1,1). Comments are interspersed with that code. Various combinations of the six in-line integer generators may put in C expressions to provide a wide variety of very fast, long period, well-tested RNG's. I invite comments, feedback, verifications and timings.
#define UL unsigned long #define znew ((z=36969*(z&65535)+(z>>16))<<16) #define wnew ((w=18000*(w&65535)+(w>>16))&65535) #define MWC (znew+wnew) #define SHR3 (jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5)) #define CONG (jcong=69069jcong+1234567) #define KISS ((MWC^CONG)+SHR3) #define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[++c+178]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[++c]+(x<y))) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)4.656613e-10 / Global static variables: */ static UL z=362436069, w=521288629, jsr=123456789, jcong=380116160; static UL t[256]; static UL x=0,y=0; static unsigned char c=0;
/* Random seeds must be used to reset z,w,jsr,jcong and the table t[256] Here is an example procedure, using KISS: */
void settable(UL i1,UL i2,UL i3,UL i4) { int i; z=i1;w=i2,jsr=i3; jcong=i4; for(i=0;i<256;i++) t[i]=KISS; }
Thank you for providing this extremely useful code. (I'd like to make use of it, however I see no copyright notice, can I assume you are making it free for anyone to use?)
I have a small problem with the definition of LFIB4 and SWB. In an attempt to make these a single line of C code, they both use "++c" in the same expression as they use "c". A C compiler is free to rearrange the order in which it calculates the intermediate terms of these expressions, so the expressions can produce different results depending on the compiler.
I might propose alternate expressions using the "," operator in order to remove any ambiguity. With a good compiler, these expressions probably won't be any slower than your original ones:
#define LFIB4_ALT (t[c]=t[c]+t[c+58]+t[c+119]+t[c+179],t[c++]) #define SWB_ALT (t[c+237]=(x=t[c+15])-(y=t[c+1]+(x<y)),t[c++ +237])
However, these are uglier and harder to understand than your original expressions, and of course I might have made a mistake in interpreting where the c++ should go. Any comments?
-- Eric Backus eric_backus@hp.com http://labejb.lsid.hp.com/ (425) 335-2495
Subject: Random numbers for C: Improvements. Date: Fri, 15 Jan 1999 11:41:47 -0500 From: George Marsaglia geo@stat.fsu.edu Message-ID: 369F6FCA.74C7C041@stat.fsu.edu References: 369B5E30.65A55FD1@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.numer-analysis Lines: 93
As I hoped, several suggestions have led to improvements in the code for RNG's I proposed for use in C. (See the thread "Random numbers for C: Some suggestions" in previous postings.) The improved code is listed below.
A question of copyright has also been raised. Unlike DIEHARD, there is no copyright on the code below. You are free to use it in any way you want, but you may wish to acknowledge the source, as a courtesy.
To avoid being cited by the style police, some have suggested using typedef rather than #define in order to replace unsigned long by UL.
Others have pointed out that one cannot be certain of the way that a compiler will evaluate terms in a sum, so using ++c in a term is dangerous. They have offered a version using the comma operator, which ensures that the table index is incremented properly. See LFIB4 and SWB below.
In my earlier work, done in Fortran, I had implemented two 16-bit multiply-with-carry generators, say z and w, as 32-bit integers, with the carry in the top 16 bits, the output in the bottom 16. They were combined by (z<<16)+w. (In Fortran, ishft(z,16)+w.) Such a combination seemed to pass all tests. In the above- mentioned post, I used (z<<16)+(w&65525), and that does not pass all tests. So (z<<16)+w seems preferable; it is used below, providing a MWC that seems to pass all tests.
The generators MWC, KISS, LFIB4 and SWB seem to pass all tests. By themselves, CONG and SHR3 do not, but using CONG+SHR3 provides one of the fastest combinations that satisfy the DIEHARD battery of tests.
Of course, one cannot have absolute confidence in any generator. The choices LFIB4 and SWB have immense periods, are very fast, and pass all tests in DIEHARD, but I am hesitant to rely on them alone---primarily because they come from such simple mod 2^32 arithmetic: four adds in LFIB4 or one subtract-with-borrow in SWB.
The code below provides a variety of in-line generators that seem promising by themselves, and even more so in combination. With them, one may feel fairly confident that combinations will produce results consistent with the underlying probability theory in your applications.
All combinations seem to support the supplemented quote from my 1984 Keynote Address:
A random number generator is like sex;
When it's good, it's wonderful,
And when it's bad, it's still pretty good.
And when it's bad, try a twosome or threesome.
The C code follows; you may want to snip and save from here--------------------------------------------------:
#define znew (z=36969*(z&65535)+(z>>16)) #define wnew (w=18000*(w&65535)+(w>>16)) #define MWC ( (znew<<16)+wnew ) #define SHR3 (jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5)) #define CONG (jcong=69069jcong+1234567) #define KISS ((MWC^CONG)+SHR3) #define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c]) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)*4.656613e-10 typedef unsigned long UL;
/* Global static variables: / static UL z=362436069, w=521288629, jsr=123456789, jcong=380116160; static UL t[256]; static UL x=0,y=0; static unsigned char c=0; / Random seeds must be used to reset z,w,jsr,jcong and the table t[256]. Here is an example procedure, using KISS: */
void settable(UL i1,UL i2,UL i3,UL i4) { int i; z=i1;w=i2,jsr=i3; jcong=i4; for(i=0;i<256;i++) t[i]=KISS; }
/* Any one of KISS, MWC, LFIB4, SWB, SHR3, or CONG can be used in an expression to provide a random 32-bit integer, while UNI provides a real in (0,1) and VNI a real in (-1,1). */
Subject: Re: Random numbers for C: Improvements. Date: 15 Jan 1999 14:02:47 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 790f49x60.fsf@poole.statgen.ncsu.edu References: 369F6FCA.74C7C041@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 11
George Marsaglia geo@stat.fsu.edu writes:
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
Shouldn't these be
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+179], ++c) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c+1]+(x<y)), ++c)
Dmitri
Subject: Re: Random numbers for C: Improvements. Date: 15 Jan 99 20:16:13 GMT From: radford@cs.toronto.edu (Radford Neal) Message-ID: 99Jan15.151547edt.785@neuron.ai.toronto.edu References: 790f49x60.fsf@poole.statgen.ncsu.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 28
George Marsaglia geo@stat.fsu.edu writes:
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
In article 790f49x60.fsf@poole.statgen.ncsu.edu, Dmitri Zaykin zaykin@statgen.ncsu.edu wrote:
Shouldn't these be
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+179], ++c) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c+1]+(x<y)), ++c)
This doesn't work either. I believe that it is undefined whether the comparison x<y uses the new or the old value of x. The SHR3 macro in the original source also suffers from this flaw.
I think one needs to face up to an unpleasant truth: The #define facility of C was poorly designed, and is incapable in general of supporting the definition of "in-line" procedures. It is far better to simply write ordinary C procedures, and accept the fairly small procedure call overhead.
Radford Neal
Subject: Re: Random numbers for C: Improvements. Date: Fri, 15 Jan 1999 17:34:36 -0600 From: "Jeff Stout" jstout@ncon.com Message-ID: QfQn2.10695$bf6.2024@news1.giganews.com References: 99Jan15.151547edt.785@neuron.ai.toronto.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 27
Radford Neal wrote in message 99Jan15.151547edt.785@neuron.ai.toronto.edu...
I think one needs to face up to an unpleasant truth: The #define facility of C was poorly designed, and is incapable in general of supporting the definition of "in-line" procedures. It is far better to simply write ordinary C procedures, and accept the fairly small procedure call overhead.
Radford Neal
This is not a poor design of the macro facility of C, but a built in limitation of the C language itself. The preprocessor is just doing a stupid text substituation, its the C compiler that is ambigous about the interpretation
The C macro language can support "in-line" procedures and a lot of other junk only true software wienies would ever use.
Jeff Stout
Subject: Re: Random numbers for C: Improvements. Date: 16 Jan 99 04:43:12 GMT From: radford@cs.toronto.edu (Radford Neal) Message-ID: 99Jan15.233424edt.6005@cortex.ai.toronto.edu References: QfQn2.10695$bf6.2024@news1.giganews.com Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 88
Radford Neal wrote in message 99Jan15.151547edt.785@neuron.ai.toronto.edu...
I think one needs to face up to an unpleasant truth: The #define facility of C was poorly designed, and is incapable in general of supporting the definition of "in-line" procedures. It is far better to simply write ordinary C procedures, and accept the fairly small procedure call overhead.
In article QfQn2.10695$bf6.2024@news1.giganews.com, Jeff Stout jstout@ncon.com wrote:
This is not a poor design of the macro facility of C, but a built in limitation of the C language itself. The preprocessor is just doing a stupid text substituation, its the C compiler that is ambigous about the interpretation
The C macro language can support "in-line" procedures and a lot of other junk only true software wienies would ever use.
In saying that the #define facility of C was poorly designed, I of course meant that it does not do what is needed to write good programs. The fact that it does stupid text substitation is a big part of this.
In the case under discussion, the only reason (I hope!) that obscure and semantically undefined constructs were being used was in order to try to define an in-line procedure using this inadequate facility.
Unfortunately, although one can indeed create lots of junk using the C pre-processor, one cannot, in general, use it to create in-line procedures. One can't write a macro equivalent to the following, for instance:
int first_zero (int *a) { int i; for (i = 0; a[i]!=0; i++) ; return i; }
Furthermore, in many cases where one CAN write in-line procedures using C macros, 99% of the people who try to do so get it wrong. For example, the following procedure CAN be written as a macro:
void badexample(int a) { while (f(a)==0) ; g(a); }
but NOT as follows:
#define badexample(a) while (f(a)==0) ; g(a);
nor as follows:
#define badexample(a) { while (f(a)==0) ; g(a); }
The following gets closer:
#define badexample(a) do { while (f(a)==0) ; g(a); } while (0)
Note: NO semicolon at the end. The necessity of this trick is left as an exercise for the reader.
Really, though, one needs to define the macro as follows:
#define badexample(a) do { int fy7653_4xq = a;
while (f(fy7653_4xq)==0) ;
g(fy7653_4xq);
} while (0)
Here, fy7853_4xq is assumed to be sufficiently obscure that it is unlikely to conflict with the name of a variable that occurs in a.
Not illustrated in this example, but commonly overlooked, is the frequent necessity to enclose references to the parameters of the macro in parentheses.
Failure to understand these arcane necessities will result in macros that behave just like the procedure they are trying to mimic around 95% of the times they are called, and which produce completely obscure bugs the other 5% of the time.
Radford Neal
Subject: Re: Random numbers for C: Improvements. Date: 16 Jan 1999 16:14:45 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7ogny6htm.fsf@poole.statgen.ncsu.edu References: QfQn2.10695$bf6.2024@news1.giganews.com Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 15
"Jeff Stout" jstout@ncon.com writes:
The C macro language can support "in-line" procedures and a lot of other junk only true software wienies would ever use.
It is probably true that the posted random number generators are better implemented as regular functions. C compilers should attempt to inline "short" functions into their callers when certain optimization flags are enabled.
Also, the new C standard will support the "inline" keyword. Some compilers (e.g. gcc called without "-ansi" switch) already recognize it.
Dmitri
Subject: Re: Random numbers for C: Improvements. Date: 16 Jan 1999 15:19:24 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7pv8f55tf.fsf@poole.statgen.ncsu.edu References: 99Jan15.151547edt.785@neuron.ai.toronto.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 29
These should work (aside the possibility of clashing with local names)
#define SHR3 (jsr=jsr^(jsr<<17), jsr=jsr^(jsr>>13), jsr=jsr^(jsr<<5)) #define LFIB4 (t[c] = t[c]+t[c+58]+t[c+119]+t[c+179], t[++c]) #define SWB (x = t[c+15], t[c+237] = x-(y=t[c+1]+(x<y)), t[++c])
Dmitri
radford@cs.toronto.edu (Radford Neal) writes:
George Marsaglia geo@stat.fsu.edu writes:
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
In article 790f49x60.fsf@poole.statgen.ncsu.edu, Dmitri Zaykin zaykin@statgen.ncsu.edu wrote:
Shouldn't these be
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+179], ++c) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c+1]+(x<y)), ++c)
This doesn't work either. I believe that it is undefined whether the comparison x<y uses the new or the old value of x. The SHR3 macro in the original source also suffers from this flaw.
Subject: Re: Random numbers for C: Improvements. Date: 17 Jan 1999 08:06:29 PST From: Ramin Sina rsina@concentric.net Message-ID: 36A21A2C.AD5145D1@concentric.net References: 7pv8f55tf.fsf@poole.statgen.ncsu.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 34
Dmitri Zaykin wrote:
These should work (aside the possibility of clashing with local names)
#define SHR3 (jsr=jsr^(jsr<<17), jsr=jsr^(jsr>>13), jsr=jsr^(jsr<<5)) #define LFIB4 (t[c] = t[c]+t[c+58]+t[c+119]+t[c+179], t[++c]) #define SWB (x = t[c+15], t[c+237] = x-(y=t[c+1]+(x<y)), t[++c])
Dmitri
radford@cs.toronto.edu (Radford Neal) writes:
George Marsaglia geo@stat.fsu.edu writes:
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
Could someone please post a short example code on how to use these in practice. I am lost in the discussion in this thread, but I would like very much to understand how to use these for example to generate 1000 uniform random numbers between betwen A and B ( say 5 and 10) .
Thanks very much, Ramin Sina
--
Ramin Sina rsina@concentric.net
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 17:34:52 GMT From: hook@nas.nasa.gov (Ed Hook) Message-ID: 77vrbs$pmj$1@sun500.nas.nasa.gov References: 7pv8f55tf.fsf@poole.statgen.ncsu.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 25
In article 7pv8f55tf.fsf@poole.statgen.ncsu.edu, zaykin@statgen.ncsu.edu (Dmitri Zaykin) writes: |> These should work (aside the possibility of clashing with local names) |> |> #define SHR3 (jsr=jsr^(jsr<<17), jsr=jsr^(jsr>>13), jsr=jsr^(jsr<<5)) |> #define LFIB4 (t[c] = t[c]+t[c+58]+t[c+119]+t[c+179], t[++c]) |> #define SWB (x = t[c+15], t[c+237] = x-(y=t[c+1]+(x<y)), t[++c]) |> |> Dmitri |>
No -- 'SHR3' and 'LFIB4' are OK, but your version of 'SWB' still invokes the dreaded undefined behavior. In the second calculation, there's no sequence point to separate the use of 'y' to determine the value of 'x<y' and the assignment to 'y'. One might argue that this is not particularly relevant, since we're trying to simulate randomness anyhow ... except that, once undefined behavior rears its ugly head, the code is allowed to simply dump core and die ...
-- Ed Hook | Copula eam, se non posit MRJ Technology Solutions, Inc. | acceptera jocularum. NAS, NASA Ames Research Center | I can barely speak for myself, much Internet: hook@nas.nasa.gov | less for my employer
Subject: Re: Random numbers for C: Improvements. Date: 19 Jan 1999 13:10:13 GMT From: orjanjo@math.ntnu.no (Orjan Johansen) Message-ID: 78207l$1l8$1@due.unit.no References: 77vrbs$pmj$1@sun500.nas.nasa.gov Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 21
In article 77vrbs$pmj$1@sun500.nas.nasa.gov, Ed Hook hook@nas.nasa.gov wrote:
In article 7pv8f55tf.fsf@poole.statgen.ncsu.edu, zaykin@statgen.ncsu.edu (Dmitri Zaykin) writes: [snip] |> #define SWB (x = t[c+15], t[c+237] = x-(y=t[c+1]+(x<y)), t[++c]) [snip] No -- 'SHR3' and 'LFIB4' are OK, but your version of 'SWB' still invokes the dreaded undefined behavior. In the second calculation, there's no sequence point to separate the use of 'y' to determine the value of 'x<y' and the assignment to 'y'. [snip]
I am not a programmer by trade, but I have a little knowledge of C. Surely in an expression of the form y=..., the evaluation of ... is guaranteed to happen before the assignment. As far as I understand, what is not guaranteed is that any assignments in ... would happen before the one to y.
Greetings, �rjan.
Subject: Re: Random numbers for C: Improvements. Date: Tue, 19 Jan 1999 18:29:34 GMT From: horst.kraemer@snafu.de (Horst Kraemer) Message-ID: 36a4cb5e.39101741@news.snafu.de References: 77vrbs$pmj$1@sun500.nas.nasa.gov Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 39
On 18 Jan 1999 17:34:52 GMT, hook@nas.nasa.gov (Ed Hook) wrote:
In article 7pv8f55tf.fsf@poole.statgen.ncsu.edu, zaykin@statgen.ncsu.edu (Dmitri Zaykin) writes: |> These should work (aside the possibility of clashing with local names) |> |> #define SHR3 (jsr=jsr^(jsr<<17), jsr=jsr^(jsr>>13), jsr=jsr^(jsr<<5)) |> #define LFIB4 (t[c] = t[c]+t[c+58]+t[c+119]+t[c+179], t[++c]) |> #define SWB (x = t[c+15], t[c+237] = x-(y=t[c+1]+(x<y)), t[++c]) |> |> Dmitri |>
No -- 'SHR3' and 'LFIB4' are OK, but your version of 'SWB' still invokes the dreaded undefined behavior. In the second calculation, there's no sequence point to separate the use of 'y' to determine the value of 'x<y' and the assignment to 'y'.
Sorry. No sequence point is needed between reading an writing to an lvalue.
According to your theory
y = y - 1
would produce undefined behaviour, too ....
What produces undefined behaviour in C is modifying an lvalue more than once between the last and next sequence point. There is only one modification of y in the expression
t[c+237] = x-(y=t[c+1]+(x<y))
Regards Horst
Subject: Re: Random numbers for C: Improvements. Date: 17 Jan 1999 13:22:21 -0500 From: hrubin@b.stat.purdue.edu (Herman Rubin) Message-ID: 77t9ot$15ha@b.stat.purdue.edu References: 99Jan15.151547edt.785@neuron.ai.toronto.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 49
In article 99Jan15.151547edt.785@neuron.ai.toronto.edu, Radford Neal radford@cs.toronto.edu wrote:
George Marsaglia geo@stat.fsu.edu writes:
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
In article 790f49x60.fsf@poole.statgen.ncsu.edu, Dmitri Zaykin zaykin@statgen.ncsu.edu wrote:
Shouldn't these be
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+179], ++c) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c+1]+(x<y)), ++c)
This doesn't work either. I believe that it is undefined whether the comparison x<y uses the new or the old value of x. The SHR3 macro in the original source also suffers from this flaw.
I think one needs to face up to an unpleasant truth: The #define facility of C was poorly designed, and is incapable in general of supporting the definition of "in-line" procedures. It is far better to simply write ordinary C procedures, and accept the fairly small procedure call overhead.
I think it should be clarified, and probably written out in some more detail. But the procedure call overhead would be comparable to the computing cost; C, and all other languages, have such great built-in inefficiencies that what is needed is something written from the standpoint of mathematics and efficiency.
But even the use of a comparison is a costly operation, if the result is to be used promptly. Any form of random, pseudo-random, or quasi-random numbers should be done in good integer form, not present now on many machines because of bad design coming from bad languages which have thrown out the natural properties of computers, and attempts to keep stupid people from making mistakes. An idiot-proof language is only for idiots.
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: Sun, 17 Jan 1999 19:10:01 GMT From: dmurdoch@pair.com (Duncan Murdoch) Message-ID: 36a23371.10418803@newshost.uwo.ca References: 77t9ot$15ha@b.stat.purdue.edu Newsgroups: sci.stat.math Lines: 27
On 17 Jan 1999 13:22:21 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
But the procedure call overhead would be comparable to the computing cost;
I don't know if that's true or not, but I don't think it is really something to worry about. The uniform value coming out of these macros is unlikely to be the end of the computation; you'll almost certainly do far more costly things with it after it's generated. So halving or doubling the speed of the generator won't have nearly so much impact on the overall calculation.
On the other hand, if it's programmed in a way that is sometimes incorrect in hard to detect ways, it might invalidate the whole calculation. That seems like a much higher cost to pay.
An idiot-proof language is only for idiots.
No, if such a thing existed, it would also reduce the occasional lapses of non-idiots. Programmers are fallible; if tools can prevent some common errors (even if it means doubling the computation time), then those tools are worth using.
I'd rather not be the first one to finish, if my answer is wrong!
Duncan Murdoch
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 08:56:33 -0500 From: hrubin@b.stat.purdue.edu (Herman Rubin) Message-ID: 77veih$11ts@b.stat.purdue.edu References: 36a23371.10418803@newshost.uwo.ca Newsgroups: sci.stat.math Lines: 63
In article 36a23371.10418803@newshost.uwo.ca, Duncan Murdoch dmurdoch@pair.com wrote:
On 17 Jan 1999 13:22:21 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
But the procedure call overhead would be comparable to the computing cost;
I don't know if that's true or not, but I don't think it is really something to worry about. The uniform value coming out of these macros is unlikely to be the end of the computation; you'll almost certainly do far more costly things with it after it's generated. So halving or doubling the speed of the generator won't have nearly so much impact on the overall calculation.
It certainly is. The cost of one procedure call could far exceed the cost of a uniform, or even many nonuniform, random variables. If procedure calls have to be made in the generation, I would not be surprised to have a factor of 10 or more. BTW, I would not use any pseudo-random procedure in any case; XORing the numbers with physical random numbers, which need not be as good as one wants the final result to be, should be done, even if those numbers should be reused, and this should be carefully done.
The fundamental output of a random number generator should be a bit stream; this can be converted easily to anything, but the converse is false.
On the other hand, if it's programmed in a way that is sometimes incorrect in hard to detect ways, it might invalidate the whole calculation. That seems like a much higher cost to pay.
This is why it is important that those who understand the mathematics of the algorithm do it; I do not believe it pays to have a library routine written by one person. HLL algorithms fail in rather odd ways, and algorithms for generating random numbers, unless they have huge errors, can only be checked by direct proof of correctness.
An idiot-proof language is only for idiots.
No, if such a thing existed, it would also reduce the occasional lapses of non-idiots. Programmers are fallible; if tools can prevent some common errors (even if it means doubling the computation time), then those tools are worth using.
On the contrary, these tools can cause the programmer to make errors which cannot be easily detected. And it does not mean just a factor of two, but far more. The current design of computers, driven by the languages, has increased the time to do multiple precision arithmetic by a factor of possibly 100 relative to what it would have been if "old" architecture was in place.
I'd rather not be the first one to finish, if my answer is wrong!
Duncan Murdoch
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 99 18:38:33 GMT From: radford@cs.toronto.edu (Radford Neal) Message-ID: 99Jan18.133644edt.6004@cortex.ai.toronto.edu References: 77veih$11ts@b.stat.purdue.edu Newsgroups: sci.stat.math Lines: 60
In article 77veih$11ts@b.stat.purdue.edu, Herman Rubin hrubin@b.stat.purdue.edu wrote:
... The cost of one procedure call could far exceed the cost of a uniform, or even many nonuniform, random variables. If procedure calls have to be made in the generation, I would not be surprised to have a factor of 10 or more.
This might have been the case with some old languages such as Algol 60, but not with C as implemented on modern machines. Even on a CISC machine like a VAX, I doubt the C procedure call overhead would be more that twice the time for the actual work, and on modern RISC machines the ratio will tend to be less.
BTW, I would not use any pseudo-random procedure in any case; XORing the numbers with physical random numbers, which need not be as good as one wants the final result to be, should be done, even if those numbers should be reused, and this should be carefully done.
I'd agree with this. I just makes the work inside the procedure greater, however, reducing the significance of the procedure call overhead.
An idiot-proof language is only for idiots.
No, if such a thing existed, it would also reduce the occasional lapses of non-idiots. Programmers are fallible; if tools can prevent some common errors (even if it means doubling the computation time), then those tools are worth using.
On the contrary, these tools can cause the programmer to make errors which cannot be easily detected.
Although such errors caused by use of high-level languages are possible, they are minor compared to the numerous ways one can go wrong when writing assembly-language programs (eg, failing to be completely consistent in conventions for whether procedures do or do not save register contents).
The current design of computers, driven by the languages, has increased the time to do multiple precision arithmetic by a factor of possibly 100 relative to what it would have been if "old" architecture was in place.
I find this hard to believe. It's true that high level languages provide no way to access operations such as 32-bit by 32-bit to 64-bit multiplication, but that would just force you to implement multiple precision arithmetic with 16-bit chunks rather than 32-bit chunks, for a slowdown of "only" a factor of four. Even accounting for additional general inefficiencies, it's hard to see how it can be any more than a factor of 10 slower. If you're claiming that the hardware itself would be designed differently if people still wrote in assembler, I'd say that if anything the effect goes the other way. The tendency with RISC machines has been to simplify the hardware (making it faster) at the expense of funny machine languages, which compilers can cope with, but which are less convenient for humans writing assembly code.
Radford Neal
Subject: Re: Random numbers for C: Improvements. Date: 19 Jan 1999 10:25:10 -0500 From: hrubin@b.stat.purdue.edu (Herman Rubin) Message-ID: 78284m$m0e@b.stat.purdue.edu References: 99Jan18.133644edt.6004@cortex.ai.toronto.edu Newsgroups: sci.stat.math Lines: 119
In article 99Jan18.133644edt.6004@cortex.ai.toronto.edu, Radford Neal radford@cs.toronto.edu wrote:
In article 77veih$11ts@b.stat.purdue.edu, Herman Rubin hrubin@b.stat.purdue.edu wrote:
... The cost of one procedure call could far exceed the cost of a uniform, or even many nonuniform, random variables. If procedure calls have to be made in the generation, I would not be surprised to have a factor of 10 or more.
This might have been the case with some old languages such as Algol 60, but not with C as implemented on modern machines. Even on a CISC machine like a VAX, I doubt the C procedure call overhead would be more that twice the time for the actual work, and on modern RISC machines the ratio will tend to be less.
There are more problems than that; the question is, whether procedure calls have to be made in the internal computation of the pseudo-random numbers. In this case, they would require a huge number of register save-restores. The example considered using a subroutine call for the generation of each random variable.
The code discussed was not in making a call for an array, but for a single step in the generation procedure.
BTW, I would not use any pseudo-random procedure in any case; XORing the numbers with physical random numbers, which need not be as good as one wants the final result to be, should be done, even if those numbers should be reused, and this should be carefully done.
I'd agree with this. I just makes the work inside the procedure greater, however, reducing the significance of the procedure call overhead.
This is best reduced by using buffers.
An idiot-proof language is only for idiots.
No, if such a thing existed, it would also reduce the occasional lapses of non-idiots. Programmers are fallible; if tools can prevent some common errors (even if it means doubling the computation time), then those tools are worth using.
On the contrary, these tools can cause the programmer to make errors which cannot be easily detected.
Although such errors caused by use of high-level languages are possible, they are minor compared to the numerous ways one can go wrong when writing assembly-language programs (eg, failing to be completely consistent in conventions for whether procedures do or do not save register contents).
It depends on how much is done; saving dozens of registers is certainly expensive. This is the order of magnitude, as the number of registers on many machines runs from 32 to 256, and possibly more. What should be used very often instead of calls, where transfer of control is needed, are jumps with return, or even quite liberal use of gotos.
The current design of computers, driven by the languages, has increased the time to do multiple precision arithmetic by a factor of possibly 100 relative to what it would have been if "old" architecture was in place.
I find this hard to believe. It's true that high level languages provide no way to access operations such as 32-bit by 32-bit to 64-bit multiplication, but that would just force you to implement multiple precision arithmetic with 16-bit chunks rather than 32-bit chunks, for a slowdown of "only" a factor of four.
It is 16-bit chunks instead of the size chunks the floating processors can handle, and also the much smaller parallelism and far slower multipliers, as well as the problem of handling the much larger number of registers. What will fit in registers for floating is likely to involve a large number of memory references in fixed. There is also likely to be lots of shifting, to extract the smaller chunks.
I did look into it in detail on the Cray, and I needed to use 20 instructions to get a product of two 48-bit numbers.
Even accounting for additional
general inefficiencies, it's hard to see how it can be any more than a factor of 10 slower. If you're claiming that the hardware itself would be designed differently if people still wrote in assembler, I'd say that if anything the effect goes the other way.
At this stage, this would not be the case. It would probably have been best if, at the time that it was decided to go beyond Fortran, the versatility of the machine instructions at the time had been put into the HLL's. Among those omitted was that of replacements producing lists (NOT structs) of results, and the ability to directly use the same hardware entities directly with different "type" operations. The simplest procedures, from the standpoint of computational complexity, to generate non-uniform random variables from uniform output use bit operations, not floating arithmetic.
The tendency with
RISC machines has been to simplify the hardware (making it faster) at the expense of funny machine languages, which compilers can cope with, but which are less convenient for humans writing assembly code.
If we had real RISC, it would not have floating point, as floating can be fairly efficiently done with fixed point. Instead, we would have fixed-point operations yielding multiple results, and flexible hardware. Forced normalization floating point arithmetic is quite difficult to use for multiple precision, or much of anything else. The IEEE committee was notified of these problems.
Radford Neal
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: Tue, 19 Jan 1999 03:23:26 GMT From: dmurdoch@pair.com (Duncan Murdoch) Message-ID: 36a5f74f.11285673@newshost.uwo.ca References: 77veih$11ts@b.stat.purdue.edu Newsgroups: sci.stat.math Lines: 41
On 18 Jan 1999 08:56:33 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
In article 36a23371.10418803@newshost.uwo.ca, Duncan Murdoch dmurdoch@pair.com wrote:
On 17 Jan 1999 13:22:21 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
On the other hand, if it's programmed in a way that is sometimes incorrect in hard to detect ways, it might invalidate the whole calculation. That seems like a much higher cost to pay.
This is why it is important that those who understand the mathematics of the algorithm do it; I do not believe it pays to have a library routine written by one person. HLL algorithms fail in rather odd ways, and algorithms for generating random numbers, unless they have huge errors, can only be checked by direct proof of correctness.
Implementations of high-level languages often have bugs in them, but the language definitions are mostly specific about what will be calculated from a given input. That's also true about machine languages, but it's certainly easier to verify when the language is closer to the original mathematical specification.
An idiot-proof language is only for idiots.
No, if such a thing existed, it would also reduce the occasional lapses of non-idiots. Programmers are fallible; if tools can prevent some common errors (even if it means doubling the computation time), then those tools are worth using.
On the contrary, these tools can cause the programmer to make errors which cannot be easily detected.
I'm not sure what you have in mind. What I had in mind are tools like strong type checking and optional range checking (which you don't get in machine languages). What sort of tools are you thinking of? Can you give examples where a tool designed to prevent errors actually introduces them?
Duncan Murdoch
Subject: Re: Random numbers for C: Improvements. Date: 19 Jan 1999 11:27:03 -0500 From: hrubin@b.stat.purdue.edu (Herman Rubin) Message-ID: 782bon$qjg@b.stat.purdue.edu References: 36a5f74f.11285673@newshost.uwo.ca Newsgroups: sci.stat.math Lines: 121
In article 36a5f74f.11285673@newshost.uwo.ca, Duncan Murdoch dmurdoch@pair.com wrote:
On 18 Jan 1999 08:56:33 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
In article 36a23371.10418803@newshost.uwo.ca, Duncan Murdoch dmurdoch@pair.com wrote:
On 17 Jan 1999 13:22:21 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
On the other hand, if it's programmed in a way that is sometimes incorrect in hard to detect ways, it might invalidate the whole calculation. That seems like a much higher cost to pay.
This is why it is important that those who understand the mathematics of the algorithm do it; I do not believe it pays to have a library routine written by one person. HLL algorithms fail in rather odd ways, and algorithms for generating random numbers, unless they have huge errors, can only be checked by direct proof of correctness.
Implementations of high-level languages often have bugs in them, but the language definitions are mostly specific about what will be calculated from a given input. That's also true about machine languages, but it's certainly easier to verify when the language is closer to the original mathematical specification.
The problems are rather that the HLLs are not designed with what the programmer can think of.
An idiot-proof language is only for idiots.
No, if such a thing existed, it would also reduce the occasional lapses of non-idiots. Programmers are fallible; if tools can prevent some common errors (even if it means doubling the computation time), then those tools are worth using.
On the contrary, these tools can cause the programmer to make errors which cannot be easily detected.
I'm not sure what you have in mind. What I had in mind are tools like strong type checking and optional range checking (which you don't get in machine languages).
Can we afford range checking? And range checking could be added to a machine language program; making it mandatory at least triples the number of instructions needed for a memory reference, and one of those is a conditional transfer. If we want it, it should be put in as a hardware interrupt in a more complicated instruction, where the efficiency loss will only be in the additional arguments.
Basic random tools should be as a bit stream, not as floating numbers, for many reasons. If given a type, it should be integer, or fixed point "real", not available in any of the common languages. Floating point CANNOT do the job without major complications. Efficient conversion is not available in most languages, and is made worse with forced normalization. Without that, efficient conversion is much easier, and multiple precision is much easier. This hardware defect was promoted by the languages; the operations needed did not occur in the languages, and thus those who looked at the programs being written did not recognize that they were considered useful.
Can we afford range checking? And range checking could be added to a machine language program; making it mandatory at least triples the number of instructions needed for a memory reference, and one of those is a conditional transfer. If we want it, it should be put in as a hardware interrupt in a more complicated instruction, where the efficiency loss will only be in the additional arguments.
What sort of tools are you thinking of?
Badly needed tools are more operations, not functions. The power operation is a good example; even the possibility of using types in C++ does not do the job. Good Fortran compilers expanded small integers at compile time, and some even used this plus the square root operation when the exponent was an integer plus .5. The C power function always uses ln and exp. Also, pack and unpack are needed, not what occurs in the languages. And even the frexp function in C shows the inefficiency of not having a list of results; the insistence of a store may more than double the cost of using the operation.
One can have very complicated instructions, and still the hardware efficiency of RISC, by putting much of the decoding in the unit where the instruction is transferred. There were machines where the first four bids were all that the instruction decoder looked at, while certain other parts of the CPU looked at other bits in parallel with this.
Can
you give examples where a tool designed to prevent errors actually introduces them?
Can we afford range checking? And range checking could be added to a machine language program; making it mandatory at least triples the number of instructions needed for a memory reference, and one of those is a conditional transfer. If we want it, it should be put in as a hardware interrupt in a more complicated instruction, where the efficiency loss will only be in the additional arguments.
One good example of this occurs in calls to procedures generating random numbers. As these calls do not have any arguments, or the arguments are fixed locations, an "optimizing" compiler will take them out of a loop!
The elaborate unnatural precedence structure in languages such as C leads to errors. I have run into this with the precedence of "+" and "<<" as "<<" is used for multiplication by a power of 2.
It was a real problem to get C to respect the users' parentheses. Failure to do so can cause major loss of precision. Some modifications of C allow binary floating point numbers; it is rare that one should use a decimal floating point number in a subroutine. This has only recently been introduced in C; it was possible to do this with an illegal cast, but on little-endian machines, this was not easy.
This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: Wed, 20 Jan 1999 02:31:55 GMT From: dmurdoch@pair.com (Duncan Murdoch) Message-ID: 36a63ccd.18681450@newshost.uwo.ca References: 782bon$qjg@b.stat.purdue.edu Newsgroups: sci.stat.math Lines: 43
On 19 Jan 1999 11:27:03 -0500, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
I'm not sure what you have in mind. What I had in mind are tools like strong type checking and optional range checking (which you don't get in machine languages).
Can we afford range checking? And range checking could be added to a machine language program; making it mandatory at least triples the number of instructions needed for a memory reference, and one of those is a conditional transfer. If we want it, it should be put in as a hardware interrupt in a more complicated instruction, where the efficiency loss will only be in the additional arguments.
I think Java is the only language I've come across that makes range checking mandatory. Most implementations make it optional: you supply the same source code, the compiler produces different machine code depending on how you set the option. That could possibly be done in an assembler, but I've never heard of anyone trying it.
The normal use for range checking is in debugging. If you have to change your source code between debugging and running, there's a big chance of introducing new bugs.
The elaborate unnatural precedence structure in languages such as C leads to errors. I have run into this with the precedence of "+" and "<<" as "<<" is used for multiplication by a power of 2.
Someone else will have to argue for C. I think it's hardly better than assembler, myself.
It was a real problem to get C to respect the users' parentheses.
I think it's pretty common for compilers to rearrange expressions to make them more convenient to calculate, under the incorrect assumption that mathematically equivalent expressions will generate equal results. But all languages that I know (including C) provide ways of forcing the order of evaluation: just put the things that need to be done first in an earlier statement. This is exactly the way you do it in assembler.
Duncan Murdoch
Subject: Re: Random numbers for C: Improvements. Date: Sun, 17 Jan 1999 14:33:32 -0500 From: dwnoon@compuserve.com (David W. Noon) Message-ID: cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net References: 77t9ot$15ha@b.stat.purdue.edu Newsgroups: sci.math.num-analysis,sci.stat.math,sci.math Lines: 62
On Sun, 17 Jan 1999 18:22:21, hrubin@b.stat.purdue.edu (Herman Rubin) wrote:
In article 99Jan15.151547edt.785@neuron.ai.toronto.edu, Radford Neal radford@cs.toronto.edu wrote:
George Marsaglia geo@stat.fsu.edu writes:
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
In article 790f49x60.fsf@poole.statgen.ncsu.edu, Dmitri Zaykin zaykin@statgen.ncsu.edu wrote:
Shouldn't these be
#define LFIB4 (t[c]=t[c]+t[c+58]+t[c+119]+t[c+179], ++c) #define SWB (t[c+237]=(x=t[c+15])-(y=t[c+1]+(x<y)), ++c)
This doesn't work either. I believe that it is undefined whether the comparison x<y uses the new or the old value of x. The SHR3 macro in the original source also suffers from this flaw.
I think one needs to face up to an unpleasant truth: The #define facility of C was poorly designed, and is incapable in general of supporting the definition of "in-line" procedures. It is far better to simply write ordinary C procedures, and accept the fairly small procedure call overhead.
I think it should be clarified, and probably written out in some more detail. But the procedure call overhead would be comparable to the computing cost; C, and all other languages, have such great built-in inefficiencies that what is needed is something written from the standpoint of mathematics and efficiency.
But even the use of a comparison is a costly operation, if the result is to be used promptly. Any form of random, pseudo-random, or quasi-random numbers should be done in good integer form, not present now on many machines because of bad design coming from bad languages which have thrown out the natural properties of computers, and attempts to keep stupid people from making mistakes. An idiot-proof language is only for idiots.
This is very admirable, but it is basically saying that the code should be in assembler. There goes portability.
I don't mind translating all this stuff into assembler, but anybody not using an Intel 80486 or better running a 32-bit version of OS/2 will not be able to use my code.
The only other language I know of that has sufficiently powerful macro facilities to force inlining like C but with better syntax, is portable across many platforms, and produces efficient object code is PL/I. If anybody besides me can use such code, I will produce it too/instead.
So, who wants it?
Regards
Dave
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 00:16:24 GMT From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 5tFXN12Xwb3.7WA43126hM3916@statgen.ncsu.edu References: cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net Newsgroups: sci.math.num-analysis,sci.stat.math,sci.math Lines: 78
dwnoon@compuserve.com (David W. Noon) writes:
The only other language I know of that has sufficiently powerful macro facilities to force inlining like C but with better syntax, is portable across many platforms, and produces efficient object code is PL/I. If anybody besides me can use such code, I will produce it too/instead.
Another option is to re-write macros as C++ member functions. If all the code is in the body of the class or "inline" keyword is explicitly given, the compiler will attempt to inline the functions.
Below is my version of it and an example of usage. I've put "//" comments in places where I thought changes to the C code are necessary. There are additional advantages over C-macros. (1) No global variables are introduced. (2) It is easy to run several "independent" random sequences by creating several variables of Rnd type and seeding them differently (something that would not be straightforward to do in C).
Dmitri
#include <limits.h> // ULONG_MAX and UCHAR_MAX defined there
class Rnd { Rnd() {} typedef unsigned long Unlo; Unlo z, w, jsr, jcong, t[UCHAR_MAX+1], x, y; unsigned char c; Unlo znew() { return (z = 36969UL*(z & 65535UL)+(z >> 16)); } // +UL Unlo wnew() { return (w = 18000UL*(w & 65535UL)+(w >> 16)); } // +UL public: Rnd (Unlo i1, Unlo i2, Unlo i3, Unlo i4) : z(i1), w(i2), jsr(i3), jcong(i4), x(0), y(0), c(0) { for(int i=0; i<UCHAR_MAX+1; i++) t[i] = Kiss(); } Unlo Mwc() { return (znew() << 16) + wnew(); } Unlo Shr3 () { // was: jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5) jsr=jsr^(jsr<<17); jsr=jsr^(jsr>>13); return (jsr=jsr^(jsr<<5)); } Unlo Cong() { return (jcong = 69069ULjcong + 1234567UL); } // +UL Unlo Kiss() { return (Mwc() ^ Cong()) + Shr3(); } Unlo Swb () { // was: t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c] x = t[(c+15) & 0xFF]; t[(c+237) & 0xFF] = x - (y = t[(c+1) & 0xFF] + (x < y)); return t[++c]; } Unlo Lfib4() { // was: t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c] t[c]=t[c]+t[(c+58) & 0xFF]+t[(c+119) & 0xFF]+t[(c+179) & 0xFF]; return t[++c]; } double Uni() { return Kiss() * 2.328306e-10; } double Vni() { return long(Kiss()) * 4.656613e-10; } double operator () () { return Uni(); } Unlo operator () (Unlo n) { return n == 1 ? 0 : Kiss() / (ULONG_MAX/n + 1); } double operator () (double Min, double Max) { return Min+Uni()(Max-Min); } };
// example of usage
#include <time.h> #include <iostream.h>
int main() { unsigned i, seed=time(0); Rnd rn (seed, 2seed, 3seed, 4*seed);
for(i=0; i<5; i++) cout << rn(5) << endl; // [0, 1, 2, 3, 4]
for(i=0; i<5; i++) cout << rn() << endl; // (0, ..., 1)
for(i=0; i<5; i++) cout << rn.Vni() << endl; // (-1, ..., 1)
for(i=0; i<5; i++) cout << rn(10, 20) << endl; // (10, ..., 20)
for(i=0; i<5; i++) cout << rn.Lfib4() << endl; // trying Lfib4
for(i=0; i<5; i++) cout << rn.Swb() << endl; // trying Swb
}
Subject: Re: Random numbers for C: Improvements. Date: Mon, 18 Jan 1999 17:07:26 -0500 From: dwnoon@compuserve.com (David W. Noon) Message-ID: <cUwZzZKf1ety-pn2-NxnsgFONxOGO@localhost> References: 5tFXN12Xwb3.7WA43126hM3916@statgen.ncsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 97
On Mon, 18 Jan 1999 00:16:24, zaykin@statgen.ncsu.edu (Dmitri Zaykin) wrote:
dwnoon@compuserve.com (David W. Noon) writes:
The only other language I know of that has sufficiently powerful macro facilities to force inlining like C but with better syntax, is portable across many platforms, and produces efficient object code is PL/I. If anybody besides me can use such code, I will produce it too/instead.
Another option is to re-write macros as C++ member functions. If all the code is in the body of the class or "inline" keyword is explicitly given, the compiler will attempt to inline the functions.
Below is my version of it and an example of usage. I've put "//" comments in places where I thought changes to the C code are necessary. There are additional advantages over C-macros. (1) No global variables are introduced. (2) It is easy to run several "independent" random sequences by creating several variables of Rnd type and seeding them differently (something that would not be straightforward to do in C).
Dmitri
#include <limits.h> // ULONG_MAX and UCHAR_MAX defined there
class Rnd { Rnd() {} typedef unsigned long Unlo; Unlo z, w, jsr, jcong, t[UCHAR_MAX+1], x, y; unsigned char c; Unlo znew() { return (z = 36969UL*(z & 65535UL)+(z >> 16)); } // +UL Unlo wnew() { return (w = 18000UL*(w & 65535UL)+(w >> 16)); } // +UL public: Rnd (Unlo i1, Unlo i2, Unlo i3, Unlo i4) : z(i1), w(i2), jsr(i3), jcong(i4), x(0), y(0), c(0) { for(int i=0; i<UCHAR_MAX+1; i++) t[i] = Kiss(); } Unlo Mwc() { return (znew() << 16) + wnew(); } Unlo Shr3 () { // was: jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5) jsr=jsr^(jsr<<17); jsr=jsr^(jsr>>13); return (jsr=jsr^(jsr<<5)); } Unlo Cong() { return (jcong = 69069ULjcong + 1234567UL); } // +UL Unlo Kiss() { return (Mwc() ^ Cong()) + Shr3(); } Unlo Swb () { // was: t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c] x = t[(c+15) & 0xFF]; t[(c+237) & 0xFF] = x - (y = t[(c+1) & 0xFF] + (x < y)); return t[++c]; } Unlo Lfib4() { // was: t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c] t[c]=t[c]+t[(c+58) & 0xFF]+t[(c+119) & 0xFF]+t[(c+179) & 0xFF]; return t[++c]; } double Uni() { return Kiss() * 2.328306e-10; } double Vni() { return long(Kiss()) * 4.656613e-10; } double operator () () { return Uni(); } Unlo operator () (Unlo n) { return n == 1 ? 0 : Kiss() / (ULONG_MAX/n + 1); } double operator () (double Min, double Max) { return Min+Uni()(Max-Min); } };
// example of usage
#include <time.h> #include <iostream.h>
int main() { unsigned i, seed=time(0); Rnd rn (seed, 2seed, 3seed, 4*seed);
for(i=0; i<5; i++) cout << rn(5) << endl; // [0, 1, 2, 3, 4] for(i=0; i<5; i++) cout << rn() << endl; // (0, ..., 1) for(i=0; i<5; i++) cout << rn.Vni() << endl; // (-1, ..., 1) for(i=0; i<5; i++) cout << rn(10, 20) << endl; // (10, ..., 20) for(i=0; i<5; i++) cout << rn.Lfib4() << endl; // trying Lfib4 for(i=0; i<5; i++) cout << rn.Swb() << endl; // trying Swb
}
The problem with C++ is that VMT calling mechanisms are almost invariably slower than the calling mechanisms used by C, PL/I, FORTRAN, Pascal, etc. [You have an extra parameter (this) to pass, even if you don't use it, and you have to do a VMT lookup to find the method.] So, while you do have a way to inline with some elegance, it can be a chore to get to the inlined code from the main program.
If you want to use C++, use a template instead of a class. It's much faster, usually, but still usually slower than C, PL/I or FORTRAN -- and way slower than assembler. For PRNG speed is important, as PRN's are usually generated in bulk.
Regards
Dave
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 17:57:49 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7sod842aa.fsf@poole.statgen.ncsu.edu References: <cUwZzZKf1ety-pn2-NxnsgFONxOGO@localhost> Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 18
dwnoon@compuserve.com (David W. Noon) writes:
The problem with C++ is that VMT calling mechanisms are almost invariably slower than the calling mechanisms used by C, PL/I, FORTRAN, Pascal, etc. [You have an extra parameter (this) to pass, even if you don't use it, and you have to do a VMT lookup to find the method.]
Well, in this particular case all that does not apply, since there are no virtual functions in the code. As C++ FAQ says, "The compiler creates a v-table for each class that has at least one virtual function."
(http://www.cerfnet.com/~mpcline/c++-faq-lite/virtual-functions.html)
Regular member functions and overloaded operators are resolved at compile time (in this case, inlined), so it gets as good as C-macros.
Dmitri
Subject: Re: Random numbers for C: Improvements. Date: Tue, 19 Jan 1999 03:39:07 GMT From: dmurdoch@pair.com (Duncan Murdoch) Message-ID: 36a3fc77.12606540@newshost.uwo.ca References: <cUwZzZKf1ety-pn2-NxnsgFONxOGO@localhost> Newsgroups: sci.stat.math Lines: 20
On Mon, 18 Jan 1999 17:07:26 -0500, dwnoon@compuserve.com (David W. Noon) wrote:
If you want to use C++, use a template instead of a class. It's much faster, usually, but still usually slower than C, PL/I or FORTRAN -- and way slower than assembler. For PRNG speed is important, as PRN's are usually generated in bulk.
...but just as with any other aspect of the program, speed isn't as important as correctness.
If I can write tricky routines in assembler that are 10 times faster than clear ones, then in a given length of computational time my Monte Carlo integration will get an extra half digit of precision.
On the other hand, if I mess up the implementation because I was just a little bit too clever for my own good, it might be that none of the digits are right.
Duncan Murdoch
Subject: Re: Random numbers for C: Improvements. Date: 20 Jan 1999 21:32:39 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7sod5s6d4.fsf@poole.statgen.ncsu.edu References: <cUwZzZKf1ety-pn2-NxnsgFONxOGO@localhost> Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 131
To check if C-macros for these random number generators do indeed always produce faster, and maybe "less bloated" code than inlined C++ member functions, I did a little experiment with timing/code size using the GNU compiler (egcs-1.1.1 g++/gcc) on Solaris 2.5.1. With this compiler, it is clearly not the case.
(1) Code size (conclusion: C++ code smaller)
inlined member functions C-macros C-macros g++ -Winline -O2 -s g++ -Winline -O2 -s gcc -Winline -O2 -s 8908 9820 9532
(2) Timing in 10 experiments (conclusion: C++ code faster)
inlined member functions C-macros C-macros g++ -Winline -O2 -s g++ -Winline -O2 -s gcc -Winline -O2 -s 11330000 15030000 14500000 11330000 15040000 14470000 11340000 15030000 14490000 11340000 15040000 14520000 11340000 15030000 14500000 11320000 15030000 14490000 11340000 15040000 14480000 11340000 15030000 14510000 11340000 15030000 14500000 11340000 15030000 14500000
//----------------------------------------------------- // C++ code:
#include <stdio.h> #include <time.h> #include <limits.h>
class Rnd { Rnd() {} typedef unsigned long Unlo; typedef unsigned char Uc; Unlo z, w, jsr, jcong, t[UCHAR_MAX+1], x, y, a, b; unsigned char c; Unlo znew() { return (z = 36969UL*(z & 65535UL)+(z >> 16)); } Unlo wnew() { return (w = 18000UL*(w & 65535UL)+(w >> 16)); } public: Rnd (Unlo i1, Unlo i2, Unlo i3, Unlo i4, Unlo i5, Unlo i6) : z(i1), w(i2), jsr(i3), jcong(i4), x(0), y(0), a(i5), b(i6), c(0) { for(int i=0; i<UCHAR_MAX+1; i++) t[i] = Kiss(); } Unlo Mwc() { return (znew() << 16) + wnew(); } Unlo Shr3 () { jsr=jsr^(jsr<<17); jsr=jsr^(jsr>>13); return (jsr=jsr^(jsr<<5)); } Unlo Cong() { return (jcong = 69069ULjcong + 1234567UL); } Unlo Kiss() { return (Mwc() ^ Cong()) + Shr3(); } Unlo Swb () { x = t[(Uc)(c+15)]; t[(Uc)(c+237)] = x - (y = t[(Uc)(c+1)] + (x < y)); return t[++c]; } Unlo Lfib4() { t[c]=t[c]+t[(Uc)(c+58)]+t[(Uc)(c+119)]+t[(Uc)(c+179)]; return t[++c]; } Unlo Fib() { b=a+b; return (a=b-a); } double Uni() { return Kiss() * 2.328306e-10; } double Vni() { return long(Kiss()) * 4.656613e-10; } double operator () () { return Uni(); } Unlo operator () (Unlo n) { return n == 1 ? 0 : Kiss() / (ULONG_MAX/n + 1); } double operator () (double Min, double Max) { return Min+Uni()(Max-Min); } };
int main() { unsigned long i, xx=0, seed=time(0); long spent; Rnd rn (seed, 2seed, 3seed, 4seed, 5seed, 6*seed);
spent=clock();
for(i=0; i<77777777; i++) xx += (rn.Kiss() + rn.Swb());
printf ("%ld \t", clock()-spent);
printf("\n");
}
/*****************************************************************/ /* C-macros */
#include <stdio.h> #include <time.h> #include <limits.h>
#define znew (z=36969UL*(z&65535UL)+(z>>16)) #define wnew (w=18000UL*(w&65535UL)+(w>>16)) #define MWC ((znew<<16)+wnew ) #define SHR3 (jsr^=(jsr<<17), jsr^=(jsr>>13), jsr^=(jsr<<5)) #define CONG (jcong=69069ULjcong+1234567UL) #define FIB ((b=a+b),(a=b-a)) #define KISS ((MWC^CONG)+SHR3) #define UC (unsigned char) #define LFIB4 (c++,t[c]=t[c]+t[UC(c+58)]+t[UC(c+119)]+t[UC(c+178)]) #define SWB (x = t[UC(c+15)], t[UC(c+237)] = x-(y=t[UC(c+1)]+(x<y)), t[++c]) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)*4.656613e-10 typedef unsigned long Un; static Un z=362436069UL, w=521288629UL, jsr=123456789UL, jcong=380116160UL; static Un a=224466889UL, b=7584631UL, t[256]; static Un x=0,y=0; static unsigned char c=0; void settable(Un i1,Un i2,Un i3,Un i4,Un i5, Un i6) { int i; z=i1;w=i2,jsr=i3; jcong=i4; a=i5; b=i6; for(i=0;i<256;i=i+1) t[i]=KISS; }
int main() { unsigned long i, xx=0, seed=time(0); long spent;
settable (seed, 2*seed, 3*seed, 4*seed, 5*seed, 6*seed);
spent=clock();
for(i=0; i<77777777; i++) xx += (KISS + SWB);
printf ("%ld \t", spent=clock()-spent);
printf("\n");
return 0;
}
Subject: Re: Random numbers for C: Improvements. Date: 21 Jan 1999 04:40:53 GMT From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 124123g26m11754.1356cjx442854@statgen.ncsu.edu References: <cUwZzZKf1ety-pn2-NxnsgFONxOGO@localhost> Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 131
To check if C-macros for these random number generators do indeed always produce faster, and maybe "less bloated" code than inlined C++ member functions, I did a little experiment with timing/code size using the GNU compiler (egcs-1.1.1 g++/gcc) on Solaris 2.5.1. With this compiler, it is clearly not the case.
(1) Executable size (conclusion: C++ code smaller)
inlined member functions C-macros C-macros g++ -Winline -O2 -s g++ -Winline -O2 -s gcc -Winline -O2 -s 8908 9820 9532
(2) Timing in 10 experiments (conclusion: C++ code faster)
inlined member functions C-macros C-macros g++ -Winline -O2 -s g++ -Winline -O2 -s gcc -Winline -O2 -s 11330000 15030000 14500000 11330000 15040000 14470000 11340000 15030000 14490000 11340000 15040000 14520000 11340000 15030000 14500000 11320000 15030000 14490000 11340000 15040000 14480000 11340000 15030000 14510000 11340000 15030000 14500000 11340000 15030000 14500000
//----------------------------------------------------- // C++ code:
#include <stdio.h> #include <time.h> #include <limits.h>
class Rnd { Rnd() {} typedef unsigned long Unlo; typedef unsigned char Uc; Unlo z, w, jsr, jcong, t[UCHAR_MAX+1], x, y, a, b; unsigned char c; Unlo znew() { return (z = 36969UL*(z & 65535UL)+(z >> 16)); } Unlo wnew() { return (w = 18000UL*(w & 65535UL)+(w >> 16)); } public: Rnd (Unlo i1, Unlo i2, Unlo i3, Unlo i4, Unlo i5, Unlo i6) : z(i1), w(i2), jsr(i3), jcong(i4), x(0), y(0), a(i5), b(i6), c(0) { for(int i=0; i<UCHAR_MAX+1; i++) t[i] = Kiss(); } Unlo Mwc() { return (znew() << 16) + wnew(); } Unlo Shr3 () { jsr=jsr^(jsr<<17); jsr=jsr^(jsr>>13); return (jsr=jsr^(jsr<<5)); } Unlo Cong() { return (jcong = 69069ULjcong + 1234567UL); } Unlo Kiss() { return (Mwc() ^ Cong()) + Shr3(); } Unlo Swb () { x = t[(Uc)(c+15)]; t[(Uc)(c+237)] = x - (y = t[(Uc)(c+1)] + (x < y)); return t[++c]; } Unlo Lfib4() { t[c]=t[c]+t[(Uc)(c+58)]+t[(Uc)(c+119)]+t[(Uc)(c+179)]; return t[++c]; } Unlo Fib() { b=a+b; return (a=b-a); } double Uni() { return Kiss() * 2.328306e-10; } double Vni() { return long(Kiss()) * 4.656613e-10; } double operator () () { return Uni(); } Unlo operator () (Unlo n) { return n == 1 ? 0 : Kiss() / (ULONG_MAX/n + 1); } double operator () (double Min, double Max) { return Min+Uni()(Max-Min); } };
int main() { unsigned long i, xx=0, seed=time(0); long spent; Rnd rn (seed, 2seed, 3seed, 4seed, 5seed, 6*seed);
spent=clock();
for(i=0; i<77777777; i++) xx += (rn.Kiss() + rn.Swb());
printf ("%ld \t", clock()-spent);
printf("\n");
}
/*****************************************************************/ /* C-macros */
#include <stdio.h> #include <time.h> #include <limits.h>
#define znew (z=36969UL*(z&65535UL)+(z>>16)) #define wnew (w=18000UL*(w&65535UL)+(w>>16)) #define MWC ((znew<<16)+wnew ) #define SHR3 (jsr^=(jsr<<17), jsr^=(jsr>>13), jsr^=(jsr<<5)) #define CONG (jcong=69069ULjcong+1234567UL) #define FIB ((b=a+b),(a=b-a)) #define KISS ((MWC^CONG)+SHR3) #define UC (unsigned char) #define LFIB4 (c++,t[c]=t[c]+t[UC(c+58)]+t[UC(c+119)]+t[UC(c+178)]) #define SWB (x = t[UC(c+15)], t[UC(c+237)] = x-(y=t[UC(c+1)]+(x<y)), t[++c]) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)*4.656613e-10 typedef unsigned long Un; static Un z=362436069UL, w=521288629UL, jsr=123456789UL, jcong=380116160UL; static Un a=224466889UL, b=7584631UL, t[256]; static Un x=0,y=0; static unsigned char c=0; void settable(Un i1,Un i2,Un i3,Un i4,Un i5, Un i6) { int i; z=i1;w=i2,jsr=i3; jcong=i4; a=i5; b=i6; for(i=0;i<256;i=i+1) t[i]=KISS; }
int main() { unsigned long i, xx=0, seed=time(0); long spent;
settable (seed, 2*seed, 3*seed, 4*seed, 5*seed, 6*seed);
spent=clock();
for(i=0; i<77777777; i++) xx += (KISS + SWB);
printf ("%ld \t", clock()-spent);
printf("\n");
return 0;
}
User-Agent: slrn/0.9.5.4 (UNIX)
Subject: Re: Random numbers for C: Improvements. Date: 21 Jan 1999 18:01:49 GMT From: davis@space.mit.edu (John E. Davis) Message-ID: slrn7aeqsa.bkc.davis@aluche.mit.edu References: 124123g26m11754.1356cjx442854@statgen.ncsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 38
On 21 Jan 1999 04:40:53 GMT, Dmitri Zaykin zaykin@statgen.ncsu.edu wrote:
(1) Executable size (conclusion: C++ code smaller) [...] (2) Timing in 10 experiments (conclusion: C++ code faster)
Using your code (t.cc and t.c), my conclusion is the opposite:
Script started on Thu Jan 21 12:54:02 1999
$ gcc -Winline -O2 -s t.c
$ ls -l a.out
-rwxr-xr-x 1 davis asc 8752 Jan 21 12:54 a.out*
$ ./a.out
21700000
$ g++ -Winline -O2 -s t.cc
t.cc: In method Rnd::Rnd(long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)': t.cc:29: warning: can't inline call to
long unsigned int Rnd::Kiss()'
t.cc:20: warning: called from here
$ ls -l a.out
-rwxr-xr-x 1 davis asc 8880 Jan 21 12:55 a.out*
$ ./a.out
25250000
$ uname -a
SunOS wiwaxia 5.6 Generic_105181-08 sun4u sparc SUNW,Ultra-2
$ gcc -v
Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.6/2.8.1/specs
gcc version 2.8.1
$ g++ -v
Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.6/2.8.1/specs
gcc version 2.8.1
$ exit
exit
script done on Thu Jan 21 12:56:40 1999
John E. Davis Center for Space Research/AXAF Science Center 617-258-8119 One Hampshire St., Building NE80-6019 http://space.mit.edu/~davis Cambridge, MA 02139-4307
Subject: Re: Random numbers for C: Improvements. Date: 21 Jan 1999 16:19:55 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7btjsuxvo.fsf@poole.statgen.ncsu.edu References: slrn7aeqsa.bkc.davis@aluche.mit.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 11
davis@space.mit.edu (John E. Davis) writes:
Using your code (t.cc and t.c), my conclusion is the opposite:
Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.6/2.8.1/specs gcc version 2.8.1
I used egcs-1.1.1 compiler. It is is an improvement over gcc-2.8.1
http://egcs.cygnus.com/egcs-1.1/egcs-1.1.1.html
Dmitri
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 21:12:35 GMT From: jmccarty@sun1307.spd.dsccc.com (Mike McCarty) Message-ID: 780843$ikj$1@relay1.dsccc.com References: cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net Newsgroups: sci.math.num-analysis,sci.stat.math,sci.math Lines: 23
In article cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net, David W. Noon dwnoon@compuserve.com wrote:
[snip]
)This is very admirable, but it is basically saying that the code )should be in assembler. There goes portability. ) )I don't mind translating all this stuff into assembler, but anybody )not using an Intel 80486 or better running a 32-bit version of OS/2 )will not be able to use my code. ) )The only other language I know of that has sufficiently powerful macro )facilities to force inlining like C but with better syntax, is )portable across many platforms, and produces efficient object code is )PL/I. If anybody besides me can use such code, I will produce it
You haven't heard of C++?
char *p="char *p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);} This message made from 100% recycled bits. I don't speak for Alcatel <- They make me say that.
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 19:27:13 -0500 From: hrubin@odds.stat.purdue.edu (Herman Rubin) Message-ID: 780jh1$11ii@odds.stat.purdue.edu References: 780843$ikj$1@relay1.dsccc.com Newsgroups: sci.math.num-analysis,sci.stat.math,sci.math Lines: 45
In article 780843$ikj$1@relay1.dsccc.com, Mike McCarty jmccarty@sun1307.spd.dsccc.com wrote:
In article cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net, David W. Noon dwnoon@compuserve.com wrote:
[snip]
)This is very admirable, but it is basically saying that the code )should be in assembler. There goes portability.
Decent random number code cannot be portable. Anything which involves bit handling, as this should, suffers from the problem. And besides the problem of portability of code, there is the portability of results. The order of generating two non-uniform random numbers, or the change of size of a buffer, can do this with ease.
)I don't mind translating all this stuff into assembler, but anybody )not using an Intel 80486 or better running a 32-bit version of OS/2 )will not be able to use my code.
)The only other language I know of that has sufficiently powerful macro )facilities to force inlining like C but with better syntax, is )portable across many platforms, and produces efficient object code is )PL/I. If anybody besides me can use such code, I will produce it
I doubt that PL/I can do the job. But I would welcome trying it.
Let me give you an important tool; this is to generate the distance to the next bit in a stream of random bits, move the pointer, and interrupt if the stream becomes empty. This is a TOOL; for some purposes, it will be used more frequently than additions.
To illustrate what one can do with it, suppose we want to generate random variables with density 4x(1-x) on (0,1). Generate two random variables A and B as above. Let K be A/2, rounded up, and let N be K+B. Then change the N-th bit to the right of the binary point of a uniform (0,1) random variable X to the opposite of the K-th to get the result. BTW, I believe you can see that to do this with a floating point representation of X is quite difficult.
-- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: Tue, 19 Jan 1999 18:09:39 -0500 From: dwnoon@compuserve.com (David W. Noon) Message-ID: <cUwZzZKf1ety-pn2-deRYWonQPOXL@localhost> References: 780jh1$11ii@odds.stat.purdue.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 86
On Tue, 19 Jan 1999 00:27:13, hrubin@odds.stat.purdue.edu (Herman Rubin) wrote:
In article cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net, David W. Noon dwnoon@compuserve.com wrote:
[snip]
)This is very admirable, but it is basically saying that the code )should be in assembler. There goes portability.
Decent random number code cannot be portable. Anything which involves bit handling, as this should, suffers from the problem. And besides the problem of portability of code, there is the portability of results. The order of generating two non-uniform random numbers, or the change of size of a buffer, can do this with ease.
I meant portable across software platforms, not hardware.
I would hope your code executes on any 32-bit Intel system the same way, whether that system is running Windows NT, OS/2, Linux, any UNIX variant or whatever.
)I don't mind translating all this stuff into assembler, but anybody )not using an Intel 80486 or better running a 32-bit version of OS/2 )will not be able to use my code.
)The only other language I know of that has sufficiently powerful macro )facilities to force inlining like C but with better syntax, is )portable across many platforms, and produces efficient object code is )PL/I. If anybody besides me can use such code, I will produce it
I doubt that PL/I can do the job. But I would welcome trying it.
It can certainly handle the code that you have posted so far, and with ease.
Let me give you an important tool; this is to generate the distance to the next bit in a stream of random bits,
Surely the distance to the next bit is always 1 position. Did you mean something else?
I think you meant the next "interesting" bit, but I don't know what your definition of "interesting" is.
move the pointer, and interrupt if the stream becomes empty. This is a TOOL; for some purposes, it will be used more frequently than additions.
But I live outside North America. I'm not allowed to use such tools. [At least I think you are alluding to cryptography.]
To illustrate what one can do with it, suppose we want to generate random variables with density 4x(1-x) on (0,1). Generate two random variables A and B as above.
You mean using the code you posted, or some equivalent?
Let K be A/2, rounded up, and
Rounded up to an integer? Since the extrema of your distribution were 0 and 1, this would make the integer K almost always 1, and very occasionally 0 (when A = 0).
let N be K+B.
Is N also to be an integer? If not, what does "N-th bit to the right" mean? If so, since B is potentially fractional is N rounded, forced up or truncated to an integer value?
Then change the N-th bit to the right of the binary point of a uniform (0,1) random variable X to the opposite of the K-th to get the result. BTW, I believe you can see that to do this with a floating point representation of X is quite difficult.
Yes, it means denormalizing the floating point number, possibly causing underflow if X is very small.
Do you want all of this implemented too? The little-endian byte arrangement of the Intel could make it rather ugly, but it is possible.
Regards
Dave
Subject: Re: Random numbers for C: Improvements. Date: 19 Jan 1999 20:30:00 -0500 From: hrubin@odds.stat.purdue.edu (Herman Rubin) Message-ID: 783bio$12vk@odds.stat.purdue.edu References: <cUwZzZKf1ety-pn2-deRYWonQPOXL@localhost> Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 117
In article <cUwZzZKf1ety-pn2-deRYWonQPOXL@localhost>, David W. Noon dwnoon@compuserve.com wrote:
On Tue, 19 Jan 1999 00:27:13, hrubin@odds.stat.purdue.edu (Herman Rubin) wrote:
In article cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net, David W. Noon dwnoon@compuserve.com wrote:
[snip]
)This is very admirable, but it is basically saying that the code )should be in assembler. There goes portability.
Decent random number code cannot be portable. Anything which involves bit handling, as this should, suffers from the problem. And besides the problem of portability of code, there is the portability of results. The order of generating two non-uniform random numbers, or the change of size of a buffer, can do this with ease.
I meant portable across software platforms, not hardware.
I would hope your code executes on any 32-bit Intel system the same way, whether that system is running Windows NT, OS/2, Linux, any UNIX variant or whatever.
It is by no means clear that HLL generated code will do this, as the machine instructions involved are different. I cannot see that there is any more problem with the current assemblers, or with more intelligently designed ones. The design changes should be for efficiency on the part of the programmer, not the machine.
)I don't mind translating all this stuff into assembler, but anybody )not using an Intel 80486 or better running a 32-bit version of OS/2 )will not be able to use my code.
)The only other language I know of that has sufficiently powerful macro )facilities to force inlining like C but with better syntax, is )portable across many platforms, and produces efficient object code is )PL/I. If anybody besides me can use such code, I will produce it
I doubt that PL/I can do the job. But I would welcome trying it.
It can certainly handle the code that you have posted so far, and with ease.
Let me give you an important tool; this is to generate the distance to the next bit in a stream of random bits,
What was meant was the distance to the next one in a stream of random bits. In other words, generate a geometric (.5) random variable using only the number of bits required by information.
Surely the distance to the next bit is always 1 position. Did you mean something else?
I think you meant the next "interesting" bit, but I don't know what your definition of "interesting" is.
move the pointer, and interrupt if the stream becomes empty. This is a TOOL; for some purposes, it will be used more frequently than additions.
But I live outside North America. I'm not allowed to use such tools. [At least I think you are alluding to cryptography.]
What does this have to do with cryptography? There are many uses for this instruction in the computer literature, although none of the others I have seen would use it that much. It is used for locating the next record, where the positions of records are stored in single bits.
To illustrate what one can do with it, suppose we want to generate random variables with density 4x(1-x) on (0,1). Generate two random variables A and B as above.
You mean using the code you posted, or some equivalent?
Let K be A/2, rounded up, and
Rounded up to an integer? Since the extrema of your distribution were 0 and 1, this would make the integer K almost always 1, and very occasionally 0 (when A = 0).
With the correction, A and B are positive integers; there is no real problem.
let N be K+B.
Is N also to be an integer? If not, what does "N-th bit to the right" mean? If so, since B is potentially fractional is N rounded, forced up or truncated to an integer value?
Then change the N-th bit to the right of the binary point of a uniform (0,1) random variable X to the opposite of the K-th to get the result. BTW, I believe you can see that to do this with a floating point representation of X is quite difficult.
Yes, it means denormalizing the floating point number, possibly causing underflow if X is very small.
In fact, it cannot, unless the mantissa of the floating point number is exceptionally long. On most machines, it will have to be done in the integer registers, anyhow. And they may not be long enough; but one does not always need that accurate random numbers. At worst, a bit past the end of the register will be targeted to be 1, and the mantissa bits all cleared.
Do you want all of this implemented too? The little-endian byte arrangement of the Intel could make it rather ugly, but it is possible.
It will not be as bad as that, but infinite precision methods are not well suited to short registers.
This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: Tue, 19 Jan 1999 18:09:38 -0500 From: dwnoon@compuserve.com (David W. Noon) Message-ID: <cUwZzZKf1ety-pn2-FJ7g8dchLbd1@localhost> References: 780843$ikj$1@relay1.dsccc.com Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 48
On Mon, 18 Jan 1999 21:12:35, jmccarty@sun1307.spd.dsccc.com (Mike McCarty) wrote:
In article cUwZzZKf1ety-pn2-vB2YCWNn5fgR@mfs-pci-bqh-vty163.as.wcom.net, David W. Noon dwnoon@compuserve.com wrote:
[snip]
)This is very admirable, but it is basically saying that the code )should be in assembler. There goes portability. ) )I don't mind translating all this stuff into assembler, but anybody )not using an Intel 80486 or better running a 32-bit version of OS/2 )will not be able to use my code. ) )The only other language I know of that has sufficiently powerful macro )facilities to force inlining like C but with better syntax, is )portable across many platforms, and produces efficient object code is )PL/I. If anybody besides me can use such code, I will produce it
You haven't heard of C++?
Of course I have. I earn my living writing C++, including the reprehensible MFC when the client demands it.
C++ just isn't ideally suited to the task under current implementations of the language. I have already suggested (in another post in this thread) templates as a substitute for method calls, to try and get some speed improvement, but generally C++ compilers produce code that is too bloated and too slow for high-volume number crunching. [You might note the expression "produces efficient object code" in my original post, which you have quoted above.]
You might ask yourself why "low level" coders use their C/C++ compilers in C mode. It is definitely not for more elegant syntax!
FORTRAN doesn't have a macro facility in present implementations. Neither does Pascal or ALGOL 68. I have compilers for all of these, but they won't do slick inlining.
So, what does that leave? COBOL? No, it lacks macros too.
That leaves the systems programmer's two faithful friends: PL/I and assembler.
Regards
Dave
Subject: Re: Random numbers for C: Improvements. Date: 20 Jan 1999 07:49:11 GMT From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7841pn$1ha$1@uni00nw.unity.ncsu.edu References: <cUwZzZKf1ety-pn2-FJ7g8dchLbd1@localhost> Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 15
In sci.stat.math David W. Noon dwnoon@compuserve.com wrote:
I have already suggested (in another post in this thread) templates as a substitute for method calls, to try and get some speed improvement (...)
There are no calls for inlined methods.
Templates can be an alternative to C-macros in terms of genericity, not inlining. C++ templates do not have to be inlined. They do not offer anything in terms of speed except as a substitute for C++ inheritance (again, as a faster tool for doing generic stuff) which is not an issue here.
Dmitri
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 21:11:32 GMT From: jmccarty@sun1307.spd.dsccc.com (Mike McCarty) Message-ID: 780824$ie4$1@relay1.dsccc.com References: 77t9ot$15ha@b.stat.purdue.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 51
In article 77t9ot$15ha@b.stat.purdue.edu, Herman Rubin hrubin@b.stat.purdue.edu wrote:
[snip]
)I think it should be clarified, and probably written out in some )more detail. But the procedure call overhead would be comparable )to the computing cost; C, and all other languages, have such great )built-in inefficiencies that what is needed is something written )from the standpoint of mathematics and efficiency.
What do you mean by this? No language has built in inefficiencies. I defy you to find anywhere in the ANSI/ISO C definition any statement about something being no more than so-and-so efficient.
Is assembler a language?
The efficiency of emitted code is a matter of quality of implementation, not of language. I wrote a compiler several years ago, intended for sale (but the project was cancelled) which automatically inlined procedures (or functions) which were called only once. The first target language for sale (it was to have different front ends) was C. C++ has a specific means for requesting that a function be expanded inline in every invocation.
Now some processors are more efficient than others, but I don't think you mean that.
)But even the use of a comparison is a costly operation, if the )result is to be used promptly. Any form of random, pseudo-random, )or quasi-random numbers should be done in good integer form,
An amazing statement, given that there are machines available today which do floating arithmetic faster than integer.
)not present now on many machines because of bad design coming )from bad languages which have thrown out the natural properties )of computers, and attempts to keep stupid people from making )mistakes. An idiot-proof language is only for idiots.
This last statement makes no sense to me at all. There are no "idiot-proof languages". Some languages have more checking built into them, but this in no way should affect the quality of the generated code.
Mike
char *p="char *p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);} This message made from 100% recycled bits. I don't speak for Alcatel <- They make me say that.
Subject: Re: Random numbers for C: Improvements. Date: 18 Jan 1999 19:14:30 -0500 From: hrubin@odds.stat.purdue.edu (Herman Rubin) Message-ID: 780ip6$11h8@odds.stat.purdue.edu References: 780824$ie4$1@relay1.dsccc.com Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 88
In article 780824$ie4$1@relay1.dsccc.com, Mike McCarty jmccarty@sun1307.spd.dsccc.com wrote:
In article 77t9ot$15ha@b.stat.purdue.edu, Herman Rubin hrubin@b.stat.purdue.edu wrote:
[snip]
)I think it should be clarified, and probably written out in some )more detail. But the procedure call overhead would be comparable )to the computing cost; C, and all other languages, have such great )built-in inefficiencies that what is needed is something written )from the standpoint of mathematics and efficiency.
What do you mean by this? No language has built in inefficiencies. I defy you to find anywhere in the ANSI/ISO C definition any statement about something being no more than so-and-so efficient.
A language has a built in inefficiency in every situation in which the optimal use of machine instructions cannot be addressed in that language. It has inefficiencies in all situations where workarounds need to be made.
Is assembler a language?
Currently, a very difficult one to use. This is because the computer people have deliberately tried to make it so. There is no reason why a user cannot reconfigure the "mnemonics" (they are generally atrocious) so that they can be easily written.
The efficiency of emitted code is a matter of quality of implementation, not of language. I wrote a compiler several years ago, intended for sale (but the project was cancelled) which automatically inlined procedures (or functions) which were called only once. The first target language for sale (it was to have different front ends) was C. C++ has a specific means for requesting that a function be expanded inline in every invocation.
More than this is needed. The implementations do not take into account what the intelligent programmer knows; they will not allow this input.
Now some processors are more efficient than others, but I don't think you mean that.
No, I do not.
)But even the use of a comparison is a costly operation, if the )result is to be used promptly. Any form of random, pseudo-random, )or quasi-random numbers should be done in good integer form,
An amazing statement, given that there are machines available today which do floating arithmetic faster than integer.
This is a design catastrophe, partly due to the bad languages. There is no such thing as floating point architecture, but architecture which does a combination of operations for badly designed floating point arithmetic. With almost no more cost, this arithmetic could be made available for fixed point. BTW, there is great need for fixed point non-integer arithmetic. And if increased precision is needed, it is necessary to emulate fixed point with floating, which is quite clumsy.
In addition, the fundamental random input should be a bit stream. Try using such. The hardware to do it in reasonable time is not present on any machine.
)not present now on many machines because of bad design coming )from bad languages which have thrown out the natural properties )of computers, and attempts to keep stupid people from making )mistakes. An idiot-proof language is only for idiots.
This last statement makes no sense to me at all. There are no "idiot-proof languages". Some languages have more checking built into them, but this in no way should affect the quality of the generated code.
I suggest that this checking be made strictly voluntary. I know what I want the computer to do, and how to use machine instructions to do it, IF I can get a list of the machine instructions. The language has no such capability.
Checking is extremely expensive. It is now largely a matter of hardware; user hardware-style interrupts are needed.
This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Subject: Re: Random numbers for C: Improvements. Date: 21 Jan 1999 19:07:19 GMT From: jmccarty@sun1307.spd.dsccc.com (Mike McCarty) Message-ID: 787tt7$gr$1@relay1.dsccc.com References: 780ip6$11h8@odds.stat.purdue.edu Newsgroups: sci.math,sci.math.num-analysis,sci.stat.math Lines: 34
In article 780ip6$11h8@odds.stat.purdue.edu, Herman Rubin hrubin@odds.stat.purdue.edu wrote: )In article 780824$ie4$1@relay1.dsccc.com, )Mike McCarty jmccarty@sun1307.spd.dsccc.com wrote: )>In article 77t9ot$15ha@b.stat.purdue.edu, )>Herman Rubin hrubin@b.stat.purdue.edu wrote:
[snip]
Much of what you wrote here struck me as unsubstantiated opinion. None of it appeared to be worth attempting to argue over.
)>This last statement makes no sense to me at all. There are no )>"idiot-proof languages". Some languages have more checking built into )>them, but this in no way should affect the quality of the generated )>code. ) )I suggest that this checking be made strictly voluntary. I know )what I want the computer to do, and how to use machine instructions )to do it, IF I can get a list of the machine instructions. The )language has no such capability. ) )Checking is extremely expensive. It is now largely a matter of )hardware; user hardware-style interrupts are needed.
The checking I had in mind costs nothing during the execution of the program.
Mike
char *p="char *p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);} This message made from 100% recycled bits. I don't speak for Alcatel <- They make me say that.
Subject: Re: Random numbers for C: Improvements. Date: 16 Jan 1999 17:56:05 -0500 From: zaykin@statgen.ncsu.edu (Dmitri Zaykin) Message-ID: 7k8ym6d4q.fsf@poole.statgen.ncsu.edu References: 369F6FCA.74C7C041@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.num-analysis Lines: 12
I see one more problem with the code.
The table is declared as "t[256]". Then expressions like t[++c] are safe assuming that the unsigned char c "wraps around" and becomes zero after 255.
However, expressions like c+237 evaluate to integer and have to be casted to the unsigned char explicitly, e.g. "t[(unsigned char)(c+237)]", should be used instead of t[c+237] (the later is accessing beyond the array bounds for c>18).
Dmitri
Subject: Re: Random numbers for C (and assembly) Date: Tue, 19 Jan 1999 11:10:17 GMT From: qscgz@my-dejanews.com Message-ID: 781p6l$lrv$1@nnrp1.dejanews.com References: 369F6FCA.74C7C041@stat.fsu.edu Newsgroups: sci.stat.math,sci.math,sci.math.numer-analysis Lines: 58
George Marsaglia wrote about these PRNGs:
MWC:((z=36969*(z&65535)+(z>>16))<<16)+((w=18000*(w&65535)+(w>>16))&65535) SHR3:(jsr=(jsr=(jsr=jsr^(jsr<<17))^(jsr>>13))^(jsr<<5)) CONG:(jcong=69069*jcong+1234567) LFIB4:(t[c]=t[c]+t[c+58]+t[c+119]+t[c+178],t[++c]) SWB:(t[c+237]=(x=t[c+15])-(y=t[c]+(x<y)),t[++c])
DIEHARD-tests fail on :
MWC (z only) : none SHR3 : brank,opso,oqso,dna,count LFIB4: none CONG (16 highbits) : opso,oqso,dna SWB: bday,brank,count
In x86-assembly :( using: a for eax , c for ecx , _ for sbb , # for adc )
bytes required | MWC(8):ax=36969 axmwc[4] ax+mwc[2] dx#0 mwc[4]=ax mwc[2]=dx ax=18000 ax[mwc] ax+mwc[6] dx#0 [mwc]=ax mwc[6]=dx SHR3(4):a=[shr3] c=a a< < 17 a^c c=a a>>13 a^c c=a a< < 5 a^c [shr3]=a CONG(4):a=69069 a*[cong] a+1234567 [cong]=a LFIB4(1032):c=lfib4[4] a=lfib4[4c+8] cl+58 a+lfib4[4c+8] cl+61 a+lfib4[4c+8] cl+60 a+lfib4[4c+8] cl+77 lfib4[4c+8]=a cl+1 lfib4[4]=c [lfib4]=a SWB(1032):c=swb[4] cl+15 a=swb[4c+8] cl-14 a_swb[4c+8] cl+236 swb[4c+8]=a cl-236 swb[4]=c [swb]=a
I estimate: 29,8,12,12,10 cyles on a P5 for MWC,SHR3,CONG,LFIB4,SWB 13,7,7,8,6 cycles on a P6 (P-II)
optimized with some calculations in parallel: 23,6,11,8,5 cycles on a P5 7,4,3,7,4 cycles on a P6
(these are only untested estimates)
for compound generators add the cycles of the parts.
( on a P6, the LFIB4 and SWB code should be changed (no cl), to avoid partial register stalls (slow). )
what is the fastest PRNG , that passes all tests ? I think that 1 or 2 bytes per cycle are possible. So, you could call a small assembly routine to fill an array with ~500 random-bytes in 1microsec. and then use it in C, if you prefer C.
Subject: Random numbers for C: The END? Date: Wed, 20 Jan 1999 10:55:14 -0500 From: George Marsaglia geo@stat.fsu.edu Message-ID: 36A5FC62.17C9CC33@stat.fsu.edu Newsgroups: sci.stat.math,sci.math Lines: 301
My offer of RNG's for C was an invitation to dance; I did not expect the Tarantella. I hope this post will stop the music, or at least slow it to a stately dance for language chauvinists and software police---under a different heading.
In response to a number of requests for good RNG's in C, and mindful of the desirability of having a variety of methods readily available, I offered several. They were implemented as in-line functions using the #define feature of C.
Numerous responses have led to improvements; the result is the listing below, with comments describing the generators.
I thank all the experts who contributed suggestions, either directly to me or as part of the numerous threads.
It seems necessary to use a (circular) table in order to get extremely long periods for some RNG's. Each new number is some combination of the previous r numbers, kept in the circular table. The circular table has to keep at least the last r, but possible more than r, numbers.
For speed, an 8-bit index seems best for accessing members of the table---at least for Fortran, where an 8-bit integer is readily available via integer*1, and arithmetic on the index is automatically mod 256 (least-absolute-residue).
Having little experience with C, I got out my little (but BIG) Kernighan and Ritchie book to see if there were an 8-bit integer type. I found none, but I did find char and unsigned char: one byte. Furthemore, K&R said arithmetic on characters was ok. That, and a study of the #define examples, led me to propose #define's for in-line generators LFIB4 and SWB, with monster periods. But it turned out that char arithmetic jumps "out of character", other than for simple cases such as c++ or c+=1. So, for safety, the index arithmetic below is kept in character by the UC definition.
Another improvement on the original version takes advantage of the comma operator, which, to my chagrin, I had not seen in K&R. It is there, but only with an example of (expression,expression). From the advice of contributors, I found that the comma operator allows (expression,...,expression,expression) with the last expression determining the value. That makes it much easier to create in-line functions via #define (see SHR3, LFIB4, SWB and FIB below).
The improved #define's are listed below, with a function to initialize the table and a main program that calls each of the in-line functions one million times and then compares the result to what I got with a DOS version of gcc. That main program can serve as a test to see if your system produces the same results as mine. _________________________________________ |If you run the program below, your output| | should be seven lines, each a 0 (zero).|
Some readers of the threads are not much interested in the philosophical aspects of computer languages, but want to know: what is the use of this stuff? Here are simple examples of the use of the in-line functions: Include the #define's in your program, with the accompanying static variable declarations, and a procedure, such as the example, for initializing the static variable (seeds) and the table.
Then any one of those in-line functions, inserted in a C expression, will provide a random 32-bit integer, or a random float if UNI or VNI is used. For example, KISS&255; would provide a random byte, while 5.+2.*UNI; would provide a random real (float) from 5 to 7. Or 1+MWC%10; would provide the proverbial "take a number from 1 to 10", (but with not quite, but virtually, equal probabilities). More generally, something such as 1+KISS%n; would provide a practical uniform random choice from 1 to n, if n is not too big.
A key point is: a wide variety of very fast, high- quality, easy-to-use RNG's are available by means of the nine in-line functions below, used individually or in combination.
The comments after the main test program describe the generators. These descriptions are much as in the first post, for those who missed them. Some of the generators (KISS, MWC, LFIB4) seem to pass all tests of randomness, particularly the DIEHARD battery of tests, and combining virtually any two or more of them should provide fast, reliable, long period generators. (CONG or FIB alone and CONG+FIB are suspect, but quite useful in combinations.)
Serious users of random numbers may want to run their simulations with several different generators, to see if they get consistent results. These #define's may make it easy to do. Bonne chance, George Marsaglia
The C code follows---------------------------------:
#include <stdio.h> #define znew (z=36969*(z&65535)+(z>>16)) #define wnew (w=18000*(w&65535)+(w>>16)) #define MWC ((znew<<16)+wnew ) #define SHR3 (jsr^=(jsr<<17), jsr^=(jsr>>13), jsr^=(jsr<<5)) #define CONG (jcong=69069jcong+1234567) #define FIB ((b=a+b),(a=b-a)) #define KISS ((MWC^CONG)+SHR3) #define LFIB4 (c++,t[c]=t[c]+t[UC(c+58)]+t[UC(c+119)]+t[UC(c+178)]) #define SWB (c++,bro=(x<y),t[c]=(x=t[UC(c+34)])-(y=t[UC(c+19)]+bro)) #define UNI (KISS2.328306e-10) #define VNI ((long) KISS)*4.656613e-10 #define UC (unsigned char) /a cast operation/ typedef unsigned long UL;
/* Global static variables: / static UL z=362436069, w=521288629, jsr=123456789, jcong=380116160; static UL a=224466889, b=7584631, t[256]; / Use random seeds to reset z,w,jsr,jcong,a,b, and the table t[256]*/
static UL x=0,y=0,bro; static unsigned char c=0;
/* Example procedure to set the table, using KISS: */ void settable(UL i1,UL i2,UL i3,UL i4,UL i5, UL i6) { int i; z=i1;w=i2,jsr=i3; jcong=i4; a=i5; b=i6; for(i=0;i<256;i=i+1) t[i]=KISS; }
/* This is a test main program. It should compile and print 7 0's. */ int main(void){ int i; UL k; settable(12345,65435,34221,12345,9983651,95746118);
for(i=1;i<1000001;i++){k=LFIB4;} printf("%u\n", k-1064612766U); for(i=1;i<1000001;i++){k=SWB ;} printf("%u\n", k- 627749721U); for(i=1;i<1000001;i++){k=KISS ;} printf("%u\n", k-1372460312U); for(i=1;i<1000001;i++){k=CONG ;} printf("%u\n", k-1529210297U); for(i=1;i<1000001;i++){k=SHR3 ;} printf("%u\n", k-2642725982U); for(i=1;i<1000001;i++){k=MWC ;} printf("%u\n", k- 904977562U); for(i=1;i<1000001;i++){k=FIB ;} printf("%u\n", k-3519793928U); } /*----------------------------------------------------- Write your own calling program and try one or more of the above, singly or in combination, when you run a simulation. You may want to change the simple 1-letter names, to avoid conflict with your own choices. */
/* All that follows is comment, mostly from the initial post. You may want to remove it */
/* Any one of KISS, MWC, FIB, LFIB4, SWB, SHR3, or CONG can be used in an expression to provide a random 32-bit integer.
The KISS generator, (Keep It Simple Stupid), is designed to combine the two multiply-with-carry generators in MWC with the 3-shift register SHR3 and the congruential generator CONG, using addition and exclusive-or. Period about 2^123. It is one of my favorite generators.
The MWC generator concatenates two 16-bit multiply- with-carry generators, x(n)=36969x(n-1)+carry, y(n)=18000y(n-1)+carry mod 2^16, has period about 2^60 and seems to pass all tests of randomness. A favorite stand-alone generator---faster than KISS, which contains it.
FIB is the classical Fibonacci sequence x(n)=x(n-1)+x(n-2),but taken modulo 2^32. Its period is 3*2^31 if one of its two seeds is odd and not 1 mod 8. It has little worth as a RNG by itself, but provides a simple and fast component for use in combination generators.
SHR3 is a 3-shift-register generator with period 2^32-1. It uses y(n)=y(n-1)(I+L^17)(I+R^13)(I+L^5), with the y's viewed as binary vectors, L the 32x32 binary matrix that shifts a vector left 1, and R its transpose. SHR3 seems to pass all except those related to the binary rank test, since 32 successive values, as binary vectors, must be linearly independent, while 32 successive truly random 32-bit integers, viewed as binary vectors, will be linearly independent only about 29% of the time.
CONG is a congruential generator with the widely used 69069 multiplier: x(n)=69069x(n-1)+1234567. It has period 2^32. The leading half of its 32 bits seem to pass tests, but bits in the last half are too regular.
LFIB4 is an extension of what I have previously defined as a lagged Fibonacci generator: x(n)=x(n-r) op x(n-s), with the x's in a finite set over which there is a binary operation op, such as +,- on integers mod 2^32, * on odd such integers, exclusive-or(xor) on binary vectors. Except for those using multiplication, lagged Fibonacci generators fail various tests of randomness, unless the lags are very long. (See SWB below). To see if more than two lags would serve to overcome the problems of 2-lag generators using +,- or xor, I have developed the 4-lag generator LFIB4 using addition: x(n)=x(n-256)+x(n-179)+x(n-119)+x(n-55) mod 2^32. Its period is 2^31*(2^256-1), about 2^287, and it seems to pass all tests---in particular, those of the kind for which 2-lag generators using +,-,xor seem to fail. For even more confidence in its suitability, LFIB4 can be combined with KISS, with a resulting period of about 2^410: just use (KISS+LFIB4) in any C expression.
SWB is a subtract-with-borrow generator that I developed to give a simple method for producing extremely long periods: x(n)=x(n-222)-x(n-237)- borrow mod 2^32. The 'borrow' is 0, or set to 1 if computing x(n-1) caused overflow in 32-bit integer arithmetic. This generator has a very long period, 2^7098(2^480-1), about 2^7578. It seems to pass all tests of randomness, except for the Birthday Spacings test, which it fails badly, as do all lagged Fibonacci generators using +,- or xor. I would suggest combining SWB with KISS, MWC, SHR3, or CONG. KISS+SWB has period >2^7700 and is highly recommended. Subtract-with-borrow has the same local behaviour as lagged Fibonacci using +,-,xor---the borrow merely provides a much longer period. SWB fails the birthday spacings test, as do all lagged Fibonacci and other generators that merely combine two previous values by means of =,- or xor. Those failures are for a particular case: m=512 birthdays in a year of n=2^24 days. There are choices of m and n for which lags >1000 will also fail the test. A reasonable precaution is to always combine a 2-lag Fibonacci or SWB generator with another kind of generator, unless the generator uses *, for which a very satisfactory sequence of odd 32-bit integers results.
The classical Fibonacci sequence mod 2^32 from FIB fails several tests. It is not suitable for use by itself, but is quite suitable for combining with other generators.
The last half of the bits of CONG are too regular, and it fails tests for which those bits play a significant role. CONG+FIB will also have too much regularity in trailing bits, as each does. But keep in mind that it is a rare application for which the trailing bits play a significant role. CONG is one of the most widely used generators of the last 30 years, as it was the system generator for VAX and was incorporated in several popular software packages, all seemingly without complaint.
Finally, because many simulations call for uniform random variables in 0<x<1 or -1<x<1, I use #define statements that permit inclusion of such variates directly in expressions: using UNI will provide a uniform random real (float) in (0,1), while VNI will provide one in (-1,1).
All of these: MWC, SHR3, CONG, KISS, LFIB4, SWB, FIB UNI and VNI, permit direct insertion of the desired random quantity into an expression, avoiding the time and space costs of a function call. I call these in-line-define functions. To use them, static variables z,w,jsr,jcong,a and b should be assigned seed values other than their initial values. If LFIB4 or SWB are used, the static table t[256] must be initialized.
A note on timing: It is difficult to provide exact time costs for inclusion of one of these in-line- define functions in an expression. Times may differ widely for different compilers, as the C operations may be deeply nested and tricky. I suggest these rough comparisons, based on averaging ten runs of a routine that is essentially a long loop: for(i=1;i<10000000;i++) L=KISS; then with KISS replaced with SHR3, CONG,... or KISS+SWB, etc. The times on my home PC, a Pentium 300MHz, in nanoseconds: FIB 49;LFIB4 77;SWB 80;CONG 80;SHR3 84;MWC 93;KISS 157; VNI 417;UNI 450; */
Terry Ritter, hiscurrent address, and histop page.
Last updated: 1999-02-18