HyperLogLog: Introduction

Counting unique items is easy until scale becomes a problem.

If you want to know how many unique visitors a website has, the simplest solution is to store every visitor ID in a set and take its size. This gives an exact answer, but memory usage grows linearly with traffic. At large scale, that approach stops being practical.

HyperLogLog takes a different approach.

Instead of storing every value, it processes a stream of items and keeps a small, fixed-size summary. From that summary, it estimates how many distinct elements have appeared. The estimate is not exact, but it is usually very close, and the memory cost stays tiny even when the input grows to millions or billions of items.

See it in action

Hashing