Back Home

Use Cases

Bloom filters are used wherever you need fast, memory-efficient membership testing and can tolerate a small number of false positives.

Databases

When you query a database for a row that does not exist, the system still has to check the disk to be sure. That is slow.

Many databases keep a Bloom filter per table or index. Before hitting the disk, the system checks the filter. If the filter says “definitely not present”, the disk read is skipped entirely. If it says “possibly present”, the disk read proceeds normally.

RocksDB and Apache Cassandra both use this technique to reduce unnecessary I/O. The cost of a false positive is one wasted disk read. The benefit is avoiding disk reads for every true negative.

Google Chrome Safe Browsing

When you navigate to a URL, Chrome needs to check whether it appears on a list of known malicious sites. That list has millions of entries. Sending every URL to a remote server would be slow and would expose your browsing history.

Instead, Chrome stores a Bloom filter of the malicious URLs locally. Most URLs are cleared instantly — no network request needed. Only URLs that pass the filter (a positive result) trigger a network lookup to confirm. This keeps the local filter small while preserving privacy.

Content Delivery Networks

A CDN caches web content across many servers. When a request arrives, the server needs to know whether a given object is in its local cache. Checking a large in-memory index on every request can be expensive.

A Bloom filter can answer “is this object definitely not here?” in microseconds. Only requests that pass the filter proceed to the full cache lookup.

Spell Checkers

A spell checker needs to test whether a typed word is in a dictionary. A Bloom filter of the dictionary fits in a tiny amount of memory and answers queries in constant time.

A false positive means a misspelled word passes the filter and is accepted as correct. In practice, false positive rates below 1% make this uncommon enough to be acceptable for most uses.

Distributed Systems — Deduplication

In a message queue or event stream, the same event can sometimes be delivered more than once. A Bloom filter can track seen event IDs and quickly reject duplicates. For very large streams, this is far more memory-efficient than keeping every seen ID in a set.

When not to use a Bloom filter

A Bloom filter is a poor fit when:

  • You need exact membership answers (no false positives allowed)
  • You need to delete items
  • Your dataset is small enough that an exact set fits comfortably in memory

For those cases, a plain hash set or a sorted list is the right tool.