Following from our previous intuition, we can keep things as simple as possible: track the greatest position of the first 1-bit we have ever seen, and use it to estimate the cardinality of the input stream.
class HyperLogLog {
counter = 0;
add(hash: number) {
this.counter = Math.max(this.counter, Math.clz32(hash) + 1);
}
estimate(): number {
return Math.pow(2, this.counter);
}
}
const hll = new HyperLogLog();
hll.add(hash32Bits("Visal"));
hll.add(hash32Bits("Hello World"));Let give it a try!
Type any value, then click “Insert” or press Enter to see the internal state update in real time.
And surprisingly, this estimator does work — at least in theory. But in practice, it works very badly.
The problem is that the estimator is driven entirely by one extreme event. A single unusually lucky hash can dominate the result forever, because we only ever remember the maximum observation.
This also introduces a strong upward bias. Randomness can push the estimate higher, but never lower. Once luck strikes, it sticks.
So while the single-counter estimator perfectly captures the core intuition behind HyperLogLog, it is far too noisy and unstable to be useful on its own.
Something needs to average out the luck.