Back Home

Low Cardinality

HyperLogLog works very well when the number of unique items is large. However, when the number of unique items is small, the estimate can be less accurate. This happens because many registers (buckets) are still empty, so the algorithm does not yet have enough information to produce a reliable estimate.

To fix this, most implementations switch to a small-range correction method (often called linear counting). Instead of using the normal HyperLogLog formula, the algorithm counts how many registers are still empty and uses that information to estimate the number of unique items more accurately.

  • Large datasets: use normal HyperLogLog formula
  • Small datasets: use Linear Counting

Linear Counting

If our estimate seems small (less than 2.5 × m) AND we have empty buckets, use Linear Counting.

E=mln(mV)E = m \cdot \ln\left(\frac{m}{V}\right)

Where:

  • m = total number of buckets
  • V = number of empty buckets (still at zero)

Implementation

Here’s how the code looks with this fix:

class HyperLogLog {
  // ... constructor and add() method from before ...
  estimate(): number {
    const m = this.m;
 
    // Choose the correction constant based on m
    const alpha =
      m === 16
        ? 0.673
        : m === 32
          ? 0.697
          : m === 64
            ? 0.709
            : 0.7213 / (1 + 1.079 / m);
 
    // Sum 2^(-M[i]) for all registers
    let sum = 0;
    for (let i = 0; i < m; i++) {
      sum += Math.pow(2, -this.registers[i]);
    }
 
    // Apply the harmonic mean formula
    let E = (alpha * m * m) / sum;
 
    // Counting empty buckets for small range correction
    const V = this.registers.filter((c) => c === 0).length;
 
    // Using Linear Counting for small estimates
    if (E <= 2.5 * m && V > 0) {
      return m * Math.log(m / V);
    }
 
    return E;
  }
}