A Bloom filter has three parameters that control its behaviour:
Getting these right determines how accurate the filter is.
After inserting n items into a filter with m bits and k hash functions, the probability of a false positive is approximately:
p ≈ (1 - e^(-k·n/m))^k
You do not need to memorise this formula. The key insight is what each variable does:
For a given m and n, there is a value of k that minimises the false positive rate:
k_optimal = (m / n) * ln(2) ≈ 0.693 * (m / n)
Using too few hashes means each item leaves a small footprint, but many items overlap. Using too many hashes flips too many bits, filling the array quickly and increasing collisions.
Adjust the sliders to change the bit array size and hash count. Watch how the false positive rate changes. Try to find the sweet spot for a given number of items.
If you know your expected item count and can tolerate a target false positive rate p, you can work backwards to find the right m:
m = -n * ln(p) / (ln(2))^2
For example, to store 1 million items with a 1% false positive rate, you need roughly 9.6 million bits — about 1.2 MB. A plain hash set storing those million items as strings could easily use 50–100 MB or more.
That is the core value of a Bloom filter: a well-tuned filter can give you a very low false positive rate at a tiny fraction of the memory cost.