2.4 KiB
title | date | draft | description | tags | categories | series | favorite | disable_feed | mathjax | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reservoir Sampling | 2024-08-02T18:30:56+01:00 | false | Elegantly sampling a stream |
|
|
|
false | false | true |
Reservoir Sampling is an online, probabilistic
algorithm to uniformly sample k
random elements out of a stream of values.
It's a particularly elegant and small algorithm, only requiring $\Theta(k)$ amount of space and a single pass through the stream.
Sampling one element
As an introduction, we'll first focus on fairly sampling one element from the stream.
def sample_one[T](stream: Iterable[T]) -> T:
stream_iter = iter(stream)
# Sample the first element
res = next(stream_iter)
for i, val in enumerate(stream_iter, start=1):
j = random.randint(0, i)
# Replace the sampled element with probability 1/(i + 1)
if j == 0:
res = val
# Return the randomly sampled element
return res
Proof
Let's now prove that this algorithm leads to a fair sampling of the stream.
We'll be doing proof by induction.
Hypothesis H_N
After iterating through the first N
items in the stream,
each of them has had an equal \frac{1}{N}
probability of being selected as
res
.
Base Case H_1
We can trivially observe that the first element is always assigned to res
,
\frac{1}{1} = 1
, the hypothesis has been verified.
Inductive Case
For a given N
, let us assume that H_N
holds. Let us now look at the events
of loop iteration where i = N
(i.e: observation of the $N + 1$-th item in the
stream).
j = random.randint(0, i)
uniformly selects a value in the range [0, i]
,
a.k.a [0, N]
. We then have two cases:
-
j == 0
, with probability\frac{1}{N + 1}
: we selectval
as the new reservoir elementres
. -
j != 0
, with probability\frac{N}{N + 1}
: we keep the previous value ofres
. ByH_N
, any of the firstN
elements had a\frac{1}{N}
probability of beingres
before at the start of the loop, each element now has a probability\frac{1}{N} \cdot \frac{N}{N + 1} = \frac{1}{N + 1}
of being the element.
And thus, we have proven H_{N + 1}
at the end of the loop.