Back Home

Multiple Counters

The single-counter approach has a problem: we rely on just one “most extreme” observation. If we get unlucky (or lucky), our estimate can be way off. Imagine estimating the average height of students by measuring only one person. You might pick the tallest or shortest by chance.

The fix is simple: run many experiments in parallel, then combine the results. This is the same idea behind taking multiple measurements and averaging them to get a more reliable answer.

Splitting the hash into buckets

Instead of running separate experiments, we cleverly reuse parts of the same hash. We split each hash into two pieces:

  • Index bits (left side): These bits pick which “bucket” (or register) to update. Think of it like sorting items into different bins.
  • Measurement bits (right side): These bits are used to find the position of the first 1-bit, just like before.

How many index bits do we need?

If we want m buckets, we need p index bits, where 2^p = m.

  • For 64 buckets: 2^6 = 64, so we need 6 index bits
  • For 256 buckets: 2^8 = 256, so we need 8 index bits

More buckets means better accuracy, but uses more memory.

Why this helps

Now the estimate doesn’t depend on one lucky hash. If one bucket gets an unusually high or low value, it only affects 1 out of m buckets. When we combine all the buckets together, these random spikes cancel out and the estimate becomes much more stable.

It’s like asking 64 different groups of people to count independently, then averaging their answers. One group might make a mistake, but the overall average is still reliable.

Implementation

Here’s a simplified version of the code. Don’t worry if the bit operations look confusing. The key idea is: extract the bucket index from the left bits, then find the first 1-bit position in the remaining bits.

class HyperLogLog {
  // p = number of index bits
  // m = number of buckets (always 2^p)
  readonly p: number;
  readonly m: number;
  readonly registers: Uint8Array;
 
  constructor(p = 6) {
    this.p = p;
    this.m = 1 << p;
    this.registers = new Uint8Array(this.m);
  }
 
  add(hash: number) {
    // Calculate bucket index
    const index = hash >>> (32 - this.p);
 
    // Shift left to remove index bits, leaving only measurement bits
    const w = (hash << this.p) >>> 0;
 
    // Count leading zeroes and put in the target bucket.
    const k = Math.clz32(w) + 1;
    if (k > this.registers[index]) this.registers[index] = k;
  }
 
  // We will implement the real estimator next
  estimate(): number {
    throw new Error("Estimator not implemented yet");
  }
}