How many hash functions does my bloom filter need? (original) (raw)

Wikipedia says:

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps or hashes some set element to one of the m array positions with a uniform random distribution.

I read the article, but what I don't understand is how k is determined. Is it a function of the table size?

Also, in hash tables I've written I used a simple but effective algorithm for automatically growing the hash's size. Basically, if ever more than 50% of the buckets in the table were filled, I would double the size of the table. I suspect you might still want to do this with a bloom filter to reduce false positives. Correct ?

Gurwinder Singh's user avatar

asked Mar 18, 2009 at 14:20

dicroce's user avatar

Given:

we want to calculate:

The formulas:

m = -n*ln(p) / (ln(2)^2) the number of bits
k = m/n * ln(2) the number of hash functions

In our case:

Note: Any code released into public domain. No attribution required.

Community's user avatar

answered Mar 17, 2014 at 23:28

Ian Boyd's user avatar

Ian BoydIan Boyd

255k263 gold badges900 silver badges1.3k bronze badges

3

If you read further down in the Wikipedia article about Bloom filters, then you find a section Probability of false positives. This section explains how the number of hash functions influences the probabilities of false positives and gives you the formula to determine k from the desired expected prob. of false positives.


Quote from the Wikipedia article:

Obviously, the probability of false positives decreases as m (the number of bits in the array) increases, and increases as n (the number of inserted elements) increases. For a given m and n, the value of k (the number of hash functions) that minimizes the probability is

formula

Community's user avatar

answered Mar 18, 2009 at 14:28

f3lix's user avatar

f3lixf3lix

29.8k11 gold badges67 silver badges86 bronze badges

answered Nov 8, 2009 at 2:46

Mischa's user avatar

MischaMischa

2,27820 silver badges18 bronze badges

Given a number of bits per key you want to "invest", the best k is:

max(1, round(bitsPerKey * log(2)))

Where max is the higher of the two, round rounds to the nearest integer, log is the natural logarithm (base e).

answered Nov 14, 2018 at 7:02

Thomas Mueller's user avatar

Thomas MuellerThomas Mueller

49.9k15 gold badges119 silver badges134 bronze badges

There is an excellent online bloomfilter calculator.

This interactive bloom filter calculator lets you estimate and find out coefficients for your bloom filter needs. It also shows you graphs to see results visually and provides all the formulas For example, calculations for 216,553 n items with probability p of 0.01:

enter image description here

n = ceil(m / (-k / log(1 - exp(log(p) / k))))
p = pow(1 - exp(-k / (m / n)), k)
m = ceil((n * log(p)) / log(1 / pow(2, log(2))));
k = round((m / n) * log(2));

answered Jul 29, 2018 at 6:12

Pavel P's user avatar

Pavel PPavel P

16.7k11 gold badges88 silver badges137 bronze badges

1