Hash Functions

A hash function takes any input — a string, a number, a URL — and converts it to a fixed-size number. The result looks random, but the same input always produces the same output.

For a Bloom filter, you need the hash to be uniformly distributed: every output value should be equally likely. This ensures that the bits you flip are spread evenly across the array, rather than clustering in one spot.

One hash is not enough

If you use a single hash, each item maps to a single bit. The problem is that many different items can map to the same bit. This causes too many false positives too quickly.

Using multiple hash functions helps. Each function maps the item to a different position in the bit array. To add an item, you flip all those positions to 1. To test an item, you check all those positions. If even one is 0, the item is definitely not present.

The more positions you check, the harder it is for a false positive to occur by accident.

Generating k hashes cheaply

Running k separate hash functions is expensive. A smarter approach is double hashing: compute two base hashes and combine them.

// h1 and h2 are two independent 32-bit hashes of the item
const h1 = hash32Bits(item);
const h2 = hash32Bits(item + "\0");
 
// Generate k positions
for (let i = 0; i < k; i++) {
  const position = (h1 + i * h2) % arraySize;
}

This gives you k different positions using only two base hash calls. The math guarantees the positions spread out well across the array as long as h2 is not zero.

Why multiple positions reduce errors

Think of it like a lock with multiple keys. A false positive requires every single position to already be 1 from earlier inserts. The more positions you check, the less likely that all of them were set by coincidence.

But there is a limit. If you use too many hash functions, you end up flipping most of the bits in the array, and the filter becomes useless. Choosing the right number is something you will explore in the tuning chapter.

Bit Array The Filter