HyperLogLog works very well when the number of unique items is large. However, when the number of unique items is small, the estimate can be less accurate. This happens because many registers (buckets) are still empty, so the algorithm does not yet have enough information to produce a reliable estimate.
To fix this, most implementations switch to a small-range correction method (often called linear counting). Instead of using the normal HyperLogLog formula, the algorithm counts how many registers are still empty and uses that information to estimate the number of unique items more accurately.
If our estimate seems small (less than 2.5 × m) AND we have empty buckets, use Linear Counting.
Where:
Here’s how the code looks with this fix:
class HyperLogLog {
// ... constructor and add() method from before ...
estimate(): number {
const m = this.m;
// Choose the correction constant based on m
const alpha =
m === 16
? 0.673
: m === 32
? 0.697
: m === 64
? 0.709
: 0.7213 / (1 + 1.079 / m);
// Sum 2^(-M[i]) for all registers
let sum = 0;
for (let i = 0; i < m; i++) {
sum += Math.pow(2, -this.registers[i]);
}
// Apply the harmonic mean formula
let E = (alpha * m * m) / sum;
// Counting empty buckets for small range correction
const V = this.registers.filter((c) => c === 0).length;
// Using Linear Counting for small estimates
if (E <= 2.5 * m && V > 0) {
return m * Math.log(m / V);
}
return E;
}
}