Birthday Attack Calculations (original) (raw)

ACiphers By Ritter Page

How can we relate the number of elements in a population and the number of random samples needed before we expect to find a duplicate or match?


Contents


Subject: Birthday Attack calculations. Date: Wed, 30 Dec 1998 23:16:02 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 368ab32b.57279906@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 7

How does one calculate the exact number of documents that need to be hashed to ensure a collision? I know that it is approximately 1.2*2^(M/2), but what is the exact formula or procedure for calculating the number?

Thanks Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: 30 Dec 1998 23:59:16 GMT From: djohn37050@aol.com (DJohn37050) Message-ID: 19981230185916.28531.00004724@ng39.aol.com References: 368ab32b.57279906@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 3

To ensure a collision, the exact number is n+1. THe birthday attack in probabilistic. See the HAC.


Subject: Re: Birthday Attack calculations. Date: Thu, 31 Dec 1998 03:46:01 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 368af32e.73668929@news.intergate.bc.ca References: 19981230185916.28531.00004724@ng39.aol.com Newsgroups: sci.crypt Lines: 10

djohn37050@aol.com (DJohn37050) wrote:

To ensure a collision, the exact number is n+1. THe birthday attack in probabilistic. See the HAC.

I should have stated for 50% probibality, the formula given is for that value. My aploogies.

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: 31 Dec 98 04:55:33 +0000 From: "Adam Atkinson" ghira@mistral.co.uk Message-ID: 563.669T1659T2953896@mistral.co.uk References: 368af32e.73668929@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 28

On 31-Dec-98 03:46:01, Fred Van Andel said:

To ensure a collision, the exact number is n+1. THe birthday attack in probabilistic. See the HAC.

I should have stated for 50% probibality, the formula given is for that value. My aploogies.

well, doing it "by hand" in perl would look something like:

$n=43949268; #$n=365; #gives 23, which is right

$p=1;

for($i=1;$p>0.5;$i++) { pāˆ—=(p*=(pāˆ—=(n-$i)/$n; } print $i."\n";

-- Adam Atkinson (ghira@mistral.co.uk) Verbing weirds language. (Calvin)


Subject: Re: Birthday Attack calculations. Date: Sat, 2 Jan 1999 14:41:56 +0000 From: David Broughton David@ddina.demon.co.uk Message-ID: $7zENAA0Ajj2EwNz@ddina.demon.co.uk References: 368ab32b.57279906@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 36

In article 368ab32b.57279906@news.intergate.bc.ca, Fred Van Andel fred_vanandel@bigfoot.com writes

How does one calculate the exact number of documents that need to be hashed to ensure a collision? I know that it is approximately 1.2*2^(M/2), but what is the exact formula or procedure for calculating the number?

Try this formula:

w = n^g + 0.29 - e

where: w = the number of documents to get 50% probability of a collision n = number of different hash codes, all equiprobable g = 0.5 + 1/(6.13 * ln(n)) ln() is the natural logarithm function e is a small error that can be ignored in practice, usually < +- 1.

This is an empirical formula that I worked out many years ago and filed away in a notebook.

The exact formula is the value of w in this equation:

( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5

but this is not a practical calculation for large n.

As you can see, w is a bit larger than the square root of n. For n = 10^6, for example, w = 1177.68 (e = -0.197).

If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n = 10^6 which is getting there.

-- David Broughton


Subject: Re: Birthday Attack calculations. Date: Sat, 2 Jan 1999 19:11:25 GMT From: ppearson@netcom.com (Peter Pearson) Message-ID: ppearsonF4y5B1.Jyv@netcom.com References: $7zENAA0Ajj2EwNz@ddina.demon.co.uk Newsgroups: sci.crypt Lines: 38

In article $7zENAA0Ajj2EwNz@ddina.demon.co.uk, David Broughton David@ddina.demon.co.uk wrote:

In article 368ab32b.57279906@news.intergate.bc.ca, Fred Van Andel fred_vanandel@bigfoot.com writes

How does one calculate the exact number of documents that need to be hashed to ensure a collision? I know that it is approximately 1.2*2^(M/2), but what is the exact formula or procedure for calculating the number?

[snip]

The exact formula is the value of w in this equation:

( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5

but this is not a practical calculation for large n.

To derive your own approximation formula from the above, exact formula, rewrite it as follows:

        n (n-1) (n-2) ... (n+1-w)      n!

product = -------------------------- = ---------- n^w (n-w)! n^w

Then, use Stirling's approximation to make the factorials more manageable. Stirling's approximation (see, for example, Knuth, Art of Computer Programming, Volume 1) is:

ln( n! ) = (n+1/2) ln(n) - n + ln( 2pi )/2 + 1/(12n) - ...

You'll have to experiment with the number of terms required to get meaningful results. Overimplifying to nln(n)-n gives conspicuously nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2pi )/2 is a good level of approximation, and one needs the approximation ln(1+x) = x (for small x) as well.


Subject: Re: Birthday Attack calculations. Date: Sat, 02 Jan 1999 22:46:05 GMT From: ritter@io.com (Terry Ritter) Message-ID: 368ea19e.8344667@news.io.com References: 368ab32b.57279906@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 40

On Wed, 30 Dec 1998 23:16:02 GMT, in 368ab32b.57279906@news.intergate.bc.ca, in sci.crypt fred_vanandel@bigfoot.com (Fred Van Andel) wrote:

How does one calculate the exact number of documents that need to be hashed to ensure a collision? I know that it is approximately 1.2*2^(M/2), but what is the exact formula or procedure for calculating the number?

[I should have stated for 50% probibality, the formula given is for that value.]

I like:

s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2

where s is the expected number of samples needed, N the size of the population being sampled, and p the given probability.

For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 * SQRT(N) = 1200.

For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 = 1228.8 for the stated approximation.

The formula basically comes out of my article on population estimation:

http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM

It might be interesting to know of an application where this sort of precision is useful.


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Birthday Attack calculations. Date: Wed, 06 Jan 1999 06:01:50 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 3693fa55.17551686@news.intergate.bc.ca References: 368ea19e.8344667@news.io.com Newsgroups: sci.crypt Lines: 17

I would like to thank those who spent time in answering my question. It has proven to be very helpful.

Terry Ritter wrote:

It might be interesting to know of an application where this sort of precision is useful.

The reason for the needed precision is that I am designing a variable length hash function and I want to test it for resistance to collision. Since I don't have the resources to do a full test for a 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes and search for trends. If the algorithm is resistant to collision in the smaller sizes and there is no trend away from the "proper" value then due to the nature of the algorithm I can be quite confidant that the larger hashes are also resistant to collisions.

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Wed, 06 Jan 1999 12:54:24 GMT From: dscott@networkusa.net Message-ID: 76vme0$avl$1@nnrp1.dejanews.com References: 3693fa55.17551686@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 36

In article 3693fa55.17551686@news.intergate.bc.ca, fred_vanandel@bigfoot.com wrote:

I would like to thank those who spent time in answering my question. It has proven to be very helpful.

Terry Ritter wrote:

It might be interesting to know of an application where this sort of precision is useful.

The reason for the needed precision is that I am designing a variable length hash function and I want to test it for resistance to collision. Since I don't have the resources to do a full test for a 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes and search for trends. If the algorithm is resistant to collision in the smaller sizes and there is no trend away from the "proper" value then due to the nature of the algorithm I can be quite confidant that the larger hashes are also resistant to collisions.

Fred Van Andel

This is a very good idea. Since only if hash is any good will it have the distribution predicticed by the bithday collision method. However funny things do happen and though it is a very god idea to check at the small lengths where more statisitics can be made. You should always run tests on the final lenght your going to use or you might get surprised.

david scott

-- http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own


Subject: Re: Birthday Attack calculations. Date: Thu, 07 Jan 1999 03:53:13 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 36942dfe.1363360@news.intergate.bc.ca References: 76vme0$avl$1@nnrp1.dejanews.com Newsgroups: sci.crypt Lines: 32

fred_vanandel@bigfoot.com wrote:

The reason for the needed precision is that I am designing a variable length hash function and I want to test it for resistance to collision. Since I don't have the resources to do a full test for a 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes and search for trends. If the algorithm is resistant to collision in the smaller sizes and there is no trend away from the "proper" value then due to the nature of the algorithm I can be quite confidant that the larger hashes are also resistant to collisions.

dscott@networkusa.net replied:

This is a very good idea. Since only if hash is any good will it have the distribution predicticed by the bithday collision method. However funny things do happen and though it is a very god idea to check at the small lengths where more statisitics can be made. You should always run tests on the final lenght your going to use or you might get surprised.

The whole point of testing on small hashes and extrapolating is that is computationally impossible to do a birthday attack on a large hash. For a 256 bit hash you will need to create more than 2^128 hashes before the odds of a collision reach 50%.

Do You know how long that will take on my 486-66. Or even a planet full of computers for that matter. The indirect evidence is the ONLY indication of collision resistance.

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Thu, 07 Jan 1999 12:28:17 GMT From: dscott@networkusa.net Message-ID: 772990$kl5$1@nnrp1.dejanews.com References: 36942dfe.1363360@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 59

In article 36942dfe.1363360@news.intergate.bc.ca, fred_vanandel@bigfoot.com wrote:

fred_vanandel@bigfoot.com wrote:

The reason for the needed precision is that I am designing a variable length hash function and I want to test it for resistance to collision. Since I don't have the resources to do a full test for a 256 bit hash I am going to test 8, 16, 24, 32 and maybe 40 bit hashes and search for trends. If the algorithm is resistant to collision in the smaller sizes and there is no trend away from the "proper" value then due to the nature of the algorithm I can be quite confidant that the larger hashes are also resistant to collisions.

dscott@networkusa.net replied:

This is a very good idea. Since only if hash is any good will it have the distribution predicticed by the bithday collision method. However funny things do happen and though it is a very god idea to check at the small lengths where more statisitics can be made. You should always run tests on the final lenght your going to use or you might get surprised.

The whole point of testing on small hashes and extrapolating is that is computationally impossible to do a birthday attack on a large hash. For a 256 bit hash you will need to create more than 2^128 hashes before the odds of a collision reach 50%.

Do You know how long that will take on my 486-66. Or even a planet full of computers for that matter. The indirect evidence is the ONLY indication of collision resistance.

Fred Van Andel

Again you should run tests on the final length. I did not say to run the number necessiary to get a 50% collision. It is more than obvious you can't run that number of tests at your full length. However it would be stupid not to run some tests in hopes that no collision occur at long length. Since if some appear at all then something is wrong. So if it passes the short lengths test still are needed for finail length. The indeirect evidence is NOT the ONLY indication you should unless to irigant try the long case in hopes of no errors.

David Scott

P.S. Example I test scott16u very much. But I still had to do a few tests on scott19u. And the upscaling produced a few bugs that I would not have found and fixed if I smuggly ignored testing all together on the longer version.

-- http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own


Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 00:14:38 GMT From: dscott@networkusa.net Message-ID: 77671c$3d2$1@nnrp1.dejanews.com References: 775qop$cbb@qualcomm.com 36942dfe.1363360@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 60

In article 775qop$cbb@qualcomm.com, ggr@qualcomm.com (Gregory G Rose) wrote:

In article 36942dfe.1363360@news.intergate.bc.ca, Fred Van Andel fred_vanandel@bigfoot.com wrote: <dscott@networkusa.net replied: <> This is a very good idea. Since only if hash is any good will it <>have the distribution predicticed by the bithday collision method. <>However funny things do happen and though it is a very god idea <>to check at the small lengths where more statisitics can be made. <>You should always run tests on the final lenght your going to use <>or you might get surprised. < <The whole point of testing on small hashes and extrapolating is that <is computationally impossible to do a birthday attack on a large hash. <For a 256 bit hash you will need to create more than 2^128 hashes <before the odds of a collision reach 50%. < <Do You know how long that will take on my 486-66. Or even a planet <full of computers for that matter. The indirect evidence is the ONLY <indication of collision resistance.

Now, let's not get testy. When David has a good idea, we should encourage it. The way I interpreted his comment was that, instead of testing the cut-down algorithms, cut down the output of the real algorithm. Generate your 256-bit hashes, split them into 32- or 64- bit chunks, and test those as if they had come from a smaller generator. Any collision problems in the larger hash, or any implementation glitch in scaling it up, should show up.

regards, Greg.

NO it is not even close to my thoughts

I thought it was a very good reply on my part sorry you where not capable of understanding it. So here it is in different words. It was previously stated that a cut down version of method tested. Apparently short enough to be able get meaning ful data based on probability of collisions. But the thing was worded that he should never test the FULL LENGTH case since the collision probabilty was virtually ZERO for any real number of tries one could make with the FULL LENGTH caae. But if the FULL LENGTH case is what one is programming the stupid hash function for. It would be FUCKING stupid not to run a few cases just to be DAM SURE non occurred. People can fuck up code you know.

David Scott

http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own


Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 05:44:15 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 3698df07.97942674@news.intergate.bc.ca References: 77671c$3d2$1@nnrp1.dejanews.com Newsgroups: sci.crypt Lines: 61

dscott@networkusa.net wrote:

ggr@qualcomm.com (Gregory G Rose) wrote:

Now, let's not get testy. When David has a good idea, we should encourage it. The way I interpreted his comment was that, instead of testing the cut-down algorithms, cut down the output of the real algorithm. Generate your 256-bit hashes, split them into 32- or 64- bit chunks, and test those as if they had come from a smaller generator. Any collision problems in the larger hash, or any implementation glitch in scaling it up, should show up.

regards, Greg.

NO it is not even close to my thoughts

I thought it was a very good reply on my part sorry you where not capable of understanding it. So here it is in different words. It was previously stated that a cut down version of method tested. Apparently short enough to be able get meaning ful data based on probability of collisions. But the thing was worded that he should never test the FULL LENGTH case since the collision probabilty was virtually ZERO for any real number of tries one could make with the FULL LENGTH caae. But if the FULL LENGTH case is what one is programming the stupid hash function for. It would be FUCKING stupid not to run a few cases just to be DAM SURE non occurred. People can fuck up code you know.

David Scott

Unfortunately running one test of a 256 bit hash for even a tiny part of its range will require far more disk space than I have access to and will have virtually zero chance of finding a collision. Even testing a mere 2^32 hashes compared with the > 2^64 required for a 50% collision of a 256 bit hash would require 128 GigaBytes of disk space to store all of the calculated hashes. I only wish that I had that much.

I will carry the testing into as far as I can go and still generate statistically significant numbers. There is no point expending all my computing power trying to force a collision of one relatively large hash because the result would be a matter of dumb luck and completely meaningless. But by testing millions of tiny hashes and tens of thousands of 40 or 48 bit hashes I can generate some meaningful numbers.

If the hash is so fatally flawed that I could routinely find a collision at much lower than the normal values then I would have found this out at the smaller hash sizes. The algorithm is scalable, there is no real difference between a small hash and a large hash except for the obvious. Therefore I expect that the properties of the smaller sizes to carry into the larger sizes.

Fred van Andel


Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 13:40:58 GMT From: dscott@networkusa.net Message-ID: 777m9a$985$1@nnrp1.dejanews.com References: 3698df07.97942674@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 81

In article 3698df07.97942674@news.intergate.bc.ca, fred_vanandel@bigfoot.com wrote:

dscott@networkusa.net wrote:

snip...

NO it is not even close to my thoughts

I thought it was a very good reply on my part sorry you where not capable of understanding it. So here it is in different words. It was previously stated that a cut down version of method tested. Apparently short enough to be able get meaning ful data based on probability of collisions. But the thing was worded that he should never test the FULL LENGTH case since the collision probabilty was virtually ZERO for any real number of tries one could make with the FULL LENGTH caae. But if the FULL LENGTH case is what one is programming the stupid hash function for. It would be FUCKING stupid not to run a few cases just to be DAM SURE non occurred. People can fuck up code you know.

David Scott

Unfortunately running one test of a 256 bit hash for even a tiny part of its range will require far more disk space than I have access to and will have virtually zero chance of finding a collision. Even testing a mere 2^32 hashes compared with the > 2^64 required for a 50% collision of a 256 bit hash would require 128 GigaBytes of disk space to store all of the calculated hashes. I only wish that I had that much.

Then run 100 cases after your done with your low level

testing. I you can't run 100 cases your method sucks. if you get any collisions at all in the 100 cases then you FUCKED up the CODE. But you MUST check the code for some cases that you actually suspect the code to run on. It reminds me of work when we bought code that was to run on a DEC machine using VT100 terminals. THE CRAP didn't work we talked to the company on phone several times. They claimed it was developed on DEC machines using VT100 terminals after months when one of there representatives came out to our site after several visits they brougth one of there VT100 termainals the CRAP ran. But they had used a VT100 clone there code would not work on a real VT100 DEC terminal. I think the company died I sure hope so. But you have to run REAL TESTS on the real one you expect the code to be used. Even if your a pompous asshole and irrigantly expect no problem.

I will carry the testing into as far as I can go and still generate statistically significant numbers. There is no point expending all my computing power trying to force a collision of one relatively large hash because the result would be a matter of dumb luck and completely meaningless. But by testing millions of tiny hashes and tens of thousands of 40 or 48 bit hashes I can generate some meaningful numbers.

If the hash is so fatally flawed that I could routinely find a collision at much lower than the normal values then I would have found this out at the smaller hash sizes. The algorithm is scalable, there is no real difference between a small hash and a large hash except for the obvious. Therefore I expect that the properties of the smaller sizes to carry into the larger sizes.

You can expect all you want. Yor should more like a manager than a real programmer with much common sense.

Fred van Andel

David Scott

http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own


Subject: Re: Birthday Attack calculations. Date: Sun, 10 Jan 1999 11:49:10 GMT From: dscott@networkusa.net Message-ID: 77a43m$7vu$1@nnrp1.dejanews.com References: 369913dc.21716751@news.intergate.bc.ca 777m9a$985$1@nnrp1.dejanews.com Newsgroups: sci.crypt Lines: 59

In article 369913dc.21716751@news.intergate.bc.ca, fred_vanandel@bigfoot.com wrote:

dscott@networkusa.net wrote:

In article 3698df07.97942674@news.intergate.bc.ca, fred_vanandel@bigfoot.com wrote:

dscott@networkusa.net wrote:

snip...

Then run 100 cases after your done with your low level testing. I you can't run 100 cases your method sucks. if you get any collisions at all in the 100 cases then you FUCKED up the CODE. But you MUST check the code for some cases that you actually suspect the code to run on. It reminds me of work when we bought code that was to run on a DEC machine using VT100 terminals. THE CRAP didn't work we talked to the company on phone several times. They claimed it was developed on DEC machines using VT100 terminals after months when one of there representatives came out to our site after several visits they brougth one of there VT100 termainals the CRAP ran. But they had used a VT100 clone there code would not work on a real VT100 DEC terminal. I think the company died I sure hope so. But you have to run REAL TESTS on the real one you expect the code to be used. Even if your a pompous asshole and irrigantly expect no problem.

There is not enough computing power on the planet to do a full collision test on a 256 bit hash. Do the math yourself.

You can expect all you want. Yor should more like a manager than a real programmer with much common sense.

I think there is an insult in there somewhere, I guess that means I have arrived.

Actual I am disapointed that some one else caught your mathematical erros while I was just wasting my time telling you over and over that a FULL collision test on your 256 bit hash was not needed. But you do need to try a few cases just to verify that ZERO occur but your obviously to pigheaded to understand this point. If this makes you think you have arrived then go ahead your arrived just like mr H.

Fred Van Andel

David Scott P.S. I never did trust the stability of a Van compared to a Car.

http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip http://members.xoom.com/ecil/index.htm

-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own


Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 04:46:16 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 3696dcdf.97390149@news.intergate.bc.ca References: 775qop$cbb@qualcomm.com 36942dfe.1363360@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 27

ggr@qualcomm.com (Gregory G Rose) wrote:

Now, let's not get testy. When David has a good idea, we should encourage it. The way I interpreted his comment was that, instead of testing the cut-down algorithms, cut down the output of the real algorithm. Generate your 256-bit hashes, split them into 32- or 64- bit chunks, and test those as if they had come from a smaller generator. Any collision problems in the larger hash, or any implementation glitch in scaling it up, should show up.

The smaller hash sizes are not from a 'cut-down' algorithm. The nature of this algorithm is that is can create hashes of any desired size (actually in 8 bit multiples). Any given size is no more 'natural' than any other size. If a smaller hash is collision resistant then I would expect that a larger hash is also collision resistant. The only real difference is that the small hashes are very easy to test, the larger one are more difficult.

I hope to have access to about 12 Pentium machines during evenings and weekends for testing purposes, this will allow me to crunch quite a few samples in search of collisions.

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Sat, 09 Jan 1999 15:20:47 -0500 From: "Trevor Jackson, III" fullmoon@aspi.net Message-ID: 3697BA1F.9FE69B16@aspi.net References: 36942dfe.1363360@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 31

The whole point of testing on small hashes and extrapolating is that is computationally impossible to do a birthday attack on a large hash. For a 256 bit hash you will need to create more than 2^128 hashes before the odds of a collision reach 50%.

Fred Van Andel

Something is wrong with this paragraph. Either the attack is not a birthday attack, or the odds of a collision are much higher. An exhaustive search of a 2^256 element state space has a 50% chance of a match agains the target of the search. This is not a birthday attack.

A birthday attack searches for any matching pair rather than a match against a target. Thus a birthday attack searching N states implies generating O(N) states, but comparisons among N states means O(N^2) comparisons (assuming the states are not generated in an order that allows comparison against neighbors only). Thus for N small the generation cost may dominate the overall cost of the search but for even a 40 bit key the comparison cost will dominate due to the 2^79 comparisons required.

The odds of finding a matching pair among N of M states is the product over 0 to N-1 of the expression (M-i)/M.

odds = 1;
for i=0 to N-1
    odds  = odds * (M-i)/M

Thus a birthday attack can find a matching pair of elements much faster than an exhaustive attack can find a match for a chosen target.


Subject: Re: Birthday Attack calculations. Date: Sun, 10 Jan 1999 08:31:51 -0500 From: "Trevor Jackson, III" fullmoon@aspi.net Message-ID: 3698ABC7.B01FDDD9@aspi.net References: 3697c1c9.703028@news.intergate.bc.ca 3697BA1F.9FE69B16@aspi.net Newsgroups: sci.crypt Lines: 80

Fred Van Andel wrote:

"Trevor Jackson, III" fullmoon@aspi.net wrote:

The whole point of testing on small hashes and extrapolating is that is computationally impossible to do a birthday attack on a large hash. For a 256 bit hash you will need to create more than 2^128 hashes before the odds of a collision reach 50%.

Fred Van Andel

Something is wrong with this paragraph. Either the attack is not a birthday attack, or the odds of a collision are much higher. An exhaustive search of a 2^256 element state space has a 50% chance of a match agains the target of the search. This is not a birthday attack.

A birthday attack searches for any matching pair rather than a match against a target. Thus a birthday attack searching N states implies generating O(N) states, but comparisons among N states means O(N^2) comparisons (assuming the states are not generated in an order that allows comparison against neighbors only). Thus for N small the generation cost may dominate the overall cost of the search but for even a 40 bit key the comparison cost will dominate due to the 2^79 comparisons required.

The odds of finding a matching pair among N of M states is the product over 0 to N-1 of the expression (M-i)/M.

odds = 1; for i=0 to N-1 odds = odds * (M-i)/M

Thus a birthday attack can find a matching pair of elements much faster than an exhaustive attack can find a match for a chosen target.

Lets see if I understand you correctly. First you generate a pair of hashes and compare them to each other. If they don't match then you throw them away and generate a new pair. Repeat until you find a match.

Hardly. Throwing away samples wastes the effort invested in creating them. For the purpose of the reply I assumed that each sample would be stored after comparison with the previously stored samples. Thus we need to store N samples, and compare (N^2)/2 times.

But by this method each pair only has one chance in N of being a collision and therefore would require N/2 total comparisons to reach the 50% level. When we are talking large hashes this becomes a hugh number.

Another way is to generate all the hashes up front, sort them and then compare. For example with a 64 bit hash you could generate your 1.2 * 2^32 hashes and save them to a file. Sorting the file will take roughly 32 * 2^32 compares for a total of approx 2^37 operations.

This is much more feasible that the 2^63 operations required when

testing pairs of data. You could run approximately 2^26 of the sorting tests in the same time as it takes to run one of the pairs test.

Bear in mind that this method does not give you the average but rather how often you are successful at the 50% level. This is a slightly different number. If you want the true average you must generate more numbers and then determine afterwards where the match occurred.

You refer to 2^79 compares required for a 40 bit hash. This the absolute worst case in which no collisions are generated even after all possible hashes have been created. The odds of reaching the end of the hash without a collision are extremely remote. Even for a hash as small as 8 bits, the odds of generating all possible hashes without a collision is about 1 in 10^110. For our 40 bit example most of the cases will cluster around 20 bits that corresponds to about 2^39 compares. That is much more manageable.

You are still using the numbers for exhaustive search. The birthday attack numbers are much smaller. As I don't have an arbitrary precision calculator at hand I'll have to do some work to generate them accurately. I'll try to generate them and post here


Subject: Re: Birthday Attack calculations. Date: Mon, 11 Jan 1999 03:43:01 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 36996c54.7217058@news.intergate.bc.ca References: 3698ABC7.B01FDDD9@aspi.net Newsgroups: sci.crypt Lines: 26

"Trevor Jackson, III" fullmoon@aspi.net wrote:

Hardly. Throwing away samples wastes the effort invested in creating them. For the purpose of the reply I assumed that each sample would be stored after comparison with the previously stored samples.

I obviously misunderstood what you were trying to say and I tailored my comments around that misunderstanding. My apologies for attempting to put words in your mouth.

However I still stand by my original statement. A birthday attack on a 256 bit hash would require in excess of 2^128 hashes to be calculated and stored before the odds of a collision reach 50%.

You are still using the numbers for exhaustive search. The birthday attack numbers are much smaller. As I don't have an arbitrary precision calculator at hand I'll have to do some work to generate them accurately. I'll try to generate them and post here.

A birthday attack would require > 2^128 calculation while an exhaustive search would need 2^255. There is a big difference. When you are dealing with large hashes even a birthday attack becomes impossible to do. Its the birthday attach that dictates the size of hash required, not the exhaustive search.

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Mon, 11 Jan 1999 22:45:19 -0500 From: "Trevor Jackson, III" fullmoon@aspi.net Message-ID: 369AC54F.CE51AD60@aspi.net References: 36996c54.7217058@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 12

Fred Van Andel wrote:

A birthday attack would require > 2^128 calculation while an exhaustive search would need 2^255. There is a big difference. When you are dealing with large hashes even a birthday attack becomes impossible to do. Its the birthday attach that dictates the size of hash required, not the exhaustive search.

I agree with your last statement above. But I an confused by your first statement above. Is there a closed-form equation for the figure 2^128 you quoted? I am only familiar with the probability series summation.


Subject: Re: Birthday Attack calculations. Date: Tue, 12 Jan 1999 06:34:34 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 369ae5c4.103855861@news.intergate.bc.ca References: 369AC54F.CE51AD60@aspi.net Newsgroups: sci.crypt Lines: 97

"Trevor Jackson, III" fullmoon@aspi.net wrote:

Fred Van Andel wrote:

A birthday attack would require > 2^128 calculation while an exhaustive search would need 2^255. There is a big difference. When you are dealing with large hashes even a birthday attack becomes impossible to do. Its the birthday attach that dictates the size of hash required, not the exhaustive search.

I agree with your last statement above. But I an confused by your first statement above. Is there a closed-form equation for the figure 2^128 you quoted? I am only familiar with the probability series summation.

You quoted the formula yourself in a earlier message

odds = 1; for i=0 to N-1 odds = odds * (M-i)/M

For any gives value of M the location of the 50% mark will be roughly the square root of M. The square root of 2^256 is 2^128.

Unless I am misunderstanding you again.

Some closed forms of the equations were given in the earlier posts of this thread.

This method was posted by David Broughton

w = n^g + 0.29 - e

where: w = the number of documents to get 50% probability of a collision n = number of different hash codes, all equiprobable g = 0.5 + 1/(6.13 * ln(n)) ln() is the natural logarithm function e is a small error that can be ignored in practice, usually < +- 1.

This is an empirical formula that I worked out many years ago and filed away in a notebook.

The exact formula is the value of w in this equation:

( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5

but this is not a practical calculation for large n.

As you can see, w is a bit larger than the square root of n. For n = 10^6, for example, w = 1177.68 (e = -0.197).

If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n = 10^6 which is getting there.

Or this one by Peter Pearson

To derive your own approximation formula from the above, exact formula, rewrite it as follows:

      n (n-1) (n-2) ... (n+1-w)      n!

product = -------------------------- = ---------- n^w (n-w)! n^w

Then, use Stirling's approximation to make the factorials more manageable. Stirling's approximation (see, for example, Knuth, Art of Computer Programming, Volume 1) is:

ln( n! ) = (n+1/2) ln(n) - n + ln( 2pi )/2 + 1/(12n) - ...

You'll have to experiment with the number of terms required to get meaningful results. Overimplifying to nln(n)-n gives conspicuously nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2pi )/2 is a good level of approximation, and one needs the approximation ln(1+x) = x (for small x) as well.

Or this one by Terry Ritter

s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2

where s is the expected number of samples needed, N the size of the population being sampled, and p the given probability.

For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 * SQRT(N) = 1200.

For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 = 1228.8 for the stated approximation.

The formula basically comes out of my article on population estimation:

http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Tue, 12 Jan 1999 08:34:48 -0500 From: "Trevor Jackson, III" fullmoon@aspi.net Message-ID: 369B4F77.E076C774@aspi.net References: 369ae5c4.103855861@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 112

Thanks for the background exposition. I think I am slightly confused over the odds of finding the first collision and the odds of finding no or any collisions.

For the purposes of the original issue we want the odds of finding the first collision. I.e., the expected wait. The series formula I described indicates the probability of finding no collisions, which also gives (by 1-p) the probability of finding any number of collisions.

Presumably even odds of finding a single collision should take less work than even odds of finding all collisions. Is this correct?

Fred Van Andel wrote:

"Trevor Jackson, III" fullmoon@aspi.net wrote:

Fred Van Andel wrote:

A birthday attack would require > 2^128 calculation while an exhaustive search would need 2^255. There is a big difference. When you are dealing with large hashes even a birthday attack becomes impossible to do. Its the birthday attach that dictates the size of hash required, not the exhaustive search.

I agree with your last statement above. But I an confused by your first statement above. Is there a closed-form equation for the figure 2^128 you quoted? I am only familiar with the probability series summation.

You quoted the formula yourself in a earlier message

odds = 1; for i=0 to N-1 odds = odds * (M-i)/M

For any gives value of M the location of the 50% mark will be roughly the square root of M. The square root of 2^256 is 2^128.

Unless I am misunderstanding you again.

Some closed forms of the equations were given in the earlier posts of this thread.

This method was posted by David Broughton

w = n^g + 0.29 - e

where: w = the number of documents to get 50% probability of a collision n = number of different hash codes, all equiprobable g = 0.5 + 1/(6.13 * ln(n)) ln() is the natural logarithm function e is a small error that can be ignored in practice, usually < +- 1.

This is an empirical formula that I worked out many years ago and filed away in a notebook.

The exact formula is the value of w in this equation:

( product from k = 1 to k = w-1 of (n-k)/n ) = 0.5

but this is not a practical calculation for large n.

As you can see, w is a bit larger than the square root of n. For n = 10^6, for example, w = 1177.68 (e = -0.197).

If your formula is meant to be 1.2 * n^0.5, then w = 1200 for n = 10^6 which is getting there.

Or this one by Peter Pearson

To derive your own approximation formula from the above, exact formula, rewrite it as follows:

      n (n-1) (n-2) ... (n+1-w)      n!

product = -------------------------- = ---------- n^w (n-w)! n^w

Then, use Stirling's approximation to make the factorials more manageable. Stirling's approximation (see, for example, Knuth, Art of Computer Programming, Volume 1) is:

ln( n! ) = (n+1/2) ln(n) - n + ln( 2pi )/2 + 1/(12n) - ...

You'll have to experiment with the number of terms required to get meaningful results. Overimplifying to nln(n)-n gives conspicuously nonsensical results. If memory serves, (n+1/2) ln(n) - n + ln( 2pi )/2 is a good level of approximation, and one needs the approximation ln(1+x) = x (for small x) as well.

Or this one by Terry Ritter

s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2

where s is the expected number of samples needed, N the size of the population being sampled, and p the given probability.

For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead of the usual handwave SQRT(N) = 1000, or the stated approximation, 1.2 * SQRT(N) = 1200.

For N = 2**20 and p = 0.5, we get 1206.2 instead of 1.2 * 1024 = 1228.8 for the stated approximation.

The formula basically comes out of my article on population estimation:

http://www.io.com/~ritter/ARTS/BIRTHDAY.HTM

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Wed, 13 Jan 1999 06:50:51 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 369d3d9c.8943462@news.intergate.bc.ca References: 369B4F77.E076C774@aspi.net Newsgroups: sci.crypt Lines: 27

"Trevor Jackson, III" fullmoon@aspi.net wrote:

Thanks for the background exposition. I think I am slightly confused over the odds of finding the first collision and the odds of finding no or any collisions.

For the purposes of the original issue we want the odds of finding the first collision. I.e., the expected wait. The series formula I described indicates the probability of finding no collisions, which also gives (by 1-p) the probability of finding any number of collisions.

Presumably even odds of finding a single collision should take less work than even odds of finding all collisions. Is this correct?

Using the equasion below the AVERAGE number of hashes that need to be tested is calculated by: i = 1' odds = 1; M = Whatever; while( odds > 0.5) { odds = odds * (M-i)/M; i++; }

For a value of 1,000,000, i is 1178.

Fred Van Andel


Subject: Re: Birthday Attack calculations. Date: Wed, 13 Jan 1999 07:52:27 GMT From: ritter@io.com (Terry Ritter) Message-ID: 369c50a1.3513345@news.io.com References: 369d3d9c.8943462@news.intergate.bc.ca Newsgroups: sci.crypt Lines: 35

On Wed, 13 Jan 1999 06:50:51 GMT, in 369d3d9c.8943462@news.intergate.bc.ca, in sci.crypt fred_vanandel@bigfoot.com (Fred Van Andel) wrote:

[...] Using the equasion below the AVERAGE number of hashes that need to be tested is calculated by: i = 1' odds = 1; M = Whatever; while( odds > 0.5) { odds = odds * (M-i)/M; i++; }

For a value of 1,000,000, i is 1178.

Which seems to compare rather well to the value 1177.9 computed from my formula in the earlier posting:

| s(N,p) = (1 + SQRT(1 - 8N ln(p))) / 2 | |where s is the expected number of samples needed, N the size of |the population being sampled, and p the given probability. | |For N = 10**6 and p = 0.5 (so ln(p) = -0.693) we get 1177.9 instead |of the usual handwave SQRT(N) = 1000 [...]


Terry Ritter ritter@io.com http://www.io.com/~ritter/ Crypto Glossary http://www.io.com/~ritter/GLOSSARY.HTM


Subject: Re: Birthday Attack calculations. Date: Thu, 14 Jan 1999 04:37:24 GMT From: fred_vanandel@bigfoot.com (Fred Van Andel) Message-ID: 369d73bf.1141606@news.intergate.bc.ca References: 369c50a1.3513345@news.io.com Newsgroups: sci.crypt Lines: 9

ritter@io.com (Terry Ritter) wrote:

For a value of 1,000,000, i is 1178.

Which seems to compare rather well to the value 1177.9 computed from my formula in the earlier posting:

Why do you think I chose the number 1,000,000. Its a good idea to check your answers before making a fool of yourself in front of the multitude.


Terry Ritter, hiscurrent address, and histop page.

Last updated: 1999-02-20