posts: reservoir-sampling: add one-element sample
This commit is contained in:
parent
3605445bcf
commit
9ff4a07c9b
|
@ -25,3 +25,59 @@ amount of space and a single pass through the stream.
|
|||
[online]: https://en.wikipedia.org/wiki/Online_algorithm
|
||||
|
||||
<!--more-->
|
||||
|
||||
## Sampling one element
|
||||
|
||||
As an introduction, we'll first focus on fairly sampling one element from the
|
||||
stream.
|
||||
|
||||
```python
|
||||
def sample_one[T](stream: Iterable[T]) -> T:
|
||||
stream_iter = iter(stream)
|
||||
# Sample the first element
|
||||
res = next(stream_iter)
|
||||
for i, val in enumerate(stream_iter, start=1):
|
||||
j = random.randint(0, i)
|
||||
# Replace the sampled element with probability 1/(i + 1)
|
||||
if j == 0:
|
||||
res = val
|
||||
# Return the randomly sampled element
|
||||
return res
|
||||
```
|
||||
|
||||
### Proof
|
||||
|
||||
Let's now prove that this algorithm leads to a fair sampling of the stream.
|
||||
|
||||
We'll be doing proof by induction.
|
||||
|
||||
#### Hypothesis $H_N$
|
||||
|
||||
After iterating through the first $N$ items in the stream,
|
||||
each of them has had an equal $\frac{1}{N}$ probability of being selected as
|
||||
`res`.
|
||||
|
||||
#### Base Case $H_1$
|
||||
|
||||
We can trivially observe that the first element is always assigned to `res`,
|
||||
$\frac{1}{1} = 1$, the hypothesis has been verified.
|
||||
|
||||
#### Inductive Case
|
||||
|
||||
For a given $N$, let us assume that $H_N$ holds. Let us now look at the events
|
||||
of loop iteration where `i = N` (i.e: observation of the $N + 1$-th item in the
|
||||
stream).
|
||||
|
||||
`j = random.randint(0, i)` uniformly selects a value in the range $[0, i]$,
|
||||
a.k.a $[0, N]$. We then have two cases:
|
||||
|
||||
* `j == 0`, with probability $\frac{1}{N + 1}$: we select `val` as the new
|
||||
reservoir element `res`.
|
||||
|
||||
* `j != 0`, with probability $\frac{N}{N + 1}$: we keep the previous value of
|
||||
`res`. By $H_N$, any of the first $N$ elements had a $\frac{1}{N}$ probability
|
||||
of being `res` before at the start of the loop, each element now has a
|
||||
probability $\frac{1}{N} \cdot \frac{N}{N + 1} = \frac{1}{N + 1}$ of being the
|
||||
element.
|
||||
|
||||
And thus, we have proven $H_{N + 1}$ at the end of the loop.
|
||||
|
|
Loading…
Reference in a new issue