Bloom Filter: Introduction

Imagine you are building a spell checker. For every word the user types, you need to answer one question: is this word in the dictionary?

The simplest solution is to store every dictionary word in a hash set and look it up. That works fine for a small dictionary. But a full English dictionary has hundreds of thousands of words, and each one takes memory to store. On a mobile device or embedded system, that cost adds up fast.

Now imagine the same problem at a larger scale. A web browser needs to check every URL you visit against a list of millions of known malicious sites. A database needs to decide whether a row might exist before issuing an expensive disk read. In both cases, the question is the same: is this item in the set?

A Bloom filter answers that question using a fraction of the memory a hash set would need.

The trade-off is accuracy. A Bloom filter can occasionally say an item is in the set when it is not — a false positive. But it will never say an item is absent when it is actually present. For many real-world uses, that trade-off is worth it.

See it in action

Bit Array