A Bloom filter is built on top of a single data structure: a bit array.
A bit array is just a list of on/off values. Each position holds exactly one bit — either 0 (off) or 1 (on). That is the smallest unit of information a computer can store.
If you have an array of 64 bits, it takes 8 bytes of memory. An array of 1024 bits takes 128 bytes. Compared to a hash set that stores full strings, a bit array is extremely compact.
At the start, every bit in the array is 0. Nothing has been added yet.
When you add an item, you flip one or more bits to 1. When you test an item, you check those same bits. If they are all 1, the item might be in the set. If any bit is 0, the item is definitely not in the set.
That last point is important. A 0 bit is a hard no. A 1 bit is a soft yes — it could mean the item is there, or it could mean those bits were already flipped by a different item. You will learn more about that in a later chapter.
A hash set stores the full value of each item. If you add the string "hello", the set keeps a copy of "hello" and uses it for exact lookup.
A bit array stores nothing about the item itself. It only records the fact that something was added at certain positions. This is why a Bloom filter uses far less memory — and also why it cannot tell you exactly what was added.
Bit array with 16 positions, nothing added yet:
Position: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Bit: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The next question is: how do you decide which bits to flip for a given item? That is where hash functions come in.