Reservoir Sampling (original) (raw)
Last Updated : 23 Jul, 2025
Reservoir sampling is a family of randomized algorithms for randomly choosing _k samples from a list of _n items, where _n is either a very large or unknown number. Typically _n is large enough that the list doesn't fit into main memory. For example, a list of search queries in Google and Facebook.
So we are given a big array (or stream) of numbers (to simplify), and we need to write an efficient function to randomly select _k numbers where _1 <= k <= n. Let the input array be _stream[].
A **simple solution is to create an array _reservoir[] of maximum size _k. One by one randomly select an item from _stream[0..n-1]. If the selected item is not previously selected, then put it in _reservoir[]. To check if an item is previously selected or not, we need to search the item in _reservoir[]. The time complexity of this algorithm will be _O(k^2). This can be costly if _k is big. Also, this is not efficient if the input is in the form of a stream.
It **can be solved in **O(n) time. The solution also suits well for input in the form of stream. The idea is similar to this post. Following are the steps.
**1) Create an array _reservoir[0..k-1] and copy first _k items of _stream[] to it.
**2) Now one by one consider all items from __(k+1)_th item to __n_th item.
...**a) Generate a random number from 0 to _i where _i is the index of the current item in _stream[]. Let the generated random number is _j.
...**b) If j is in range 0 to _k-1, replace reservoir[j] with stream__[i]
Following is the implementation of the above algorithm.
C++ `
// An efficient program to randomly select // k items from a stream of items #include <bits/stdc++.h> #include <time.h> using namespace std;
// A utility function to print an array void printArray(int stream[], int n) { for (int i = 0; i < n; i++) cout << stream[i] << " "; cout << endl; }
// A function to randomly select // k items from stream[0..n-1]. void selectKItems(int stream[], int n, int k) { int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize
// it with first k elements from stream[]
int reservoir[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
// Use a different seed value so that we don't get
// same result each time we run this program
srand(time(NULL));
// Iterate from the (k+1)th element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = rand() % (i + 1);
// If the randomly picked index is smaller than k,
// then replace the element present at the index
// with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
cout << "Following are k randomly selected items \n";
printArray(reservoir, k); }
// Driver Code int main() { int stream[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; int n = sizeof(stream)/sizeof(stream[0]); int k = 5; selectKItems(stream, n, k); return 0; }
// This is code is contributed by rathbhupendra
C
// An efficient program to randomly select k items from a stream of items
#include <stdio.h> #include <stdlib.h> #include <time.h>
// A utility function to print an array void printArray(int stream[], int n) { for (int i = 0; i < n; i++) printf("%d ", stream[i]); printf("\n"); }
// A function to randomly select k items from stream[0..n-1]. void selectKItems(int stream[], int n, int k) { int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize it with
// first k elements from stream[]
int reservoir[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
// Use a different seed value so that we don't get
// same result each time we run this program
srand(time(NULL));
// Iterate from the (k+1)th element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = rand() % (i+1);
// If the randomly picked index is smaller than k, then replace
// the element present at the index with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
printf("Following are k randomly selected items \n");
printArray(reservoir, k);}
// Driver program to test above function. int main() { int stream[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; int n = sizeof(stream)/sizeof(stream[0]); int k = 5; selectKItems(stream, n, k); return 0; }
Java
// An efficient Java program to randomly // select k items from a stream of items import java.util.Arrays; import java.util.Random; public class ReservoirSampling {
// A function to randomly select k items from
// stream[0..n-1].
static void selectKItems(int stream[], int n, int k)
{
int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize it
// with first k elements from stream[]
int reservoir[] = new int[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
Random r = new Random();
// Iterate from the (k+1)th element to nth element
for (; i < n; i++) {
// Pick a random index from 0 to i.
int j = r.nextInt(i + 1);
// If the randomly picked index is smaller than
// k, then replace the element present at the
// index with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
System.out.println(
"Following are k randomly selected items");
System.out.println(Arrays.toString(reservoir));
}
// Driver Program to test above method
public static void main(String[] args)
{
int stream[]
= { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
int n = stream.length;
int k = 5;
selectKItems(stream, n, k);
}} // This code is contributed by Sumit Ghosh
Python3
An efficient Python3 program
to randomly select k items
from a stream of items
import random
A utility function
to print an array
def printArray(stream,n): for i in range(n): print(stream[i],end=" "); print();
A function to randomly select
k items from stream[0..n-1].
def selectKItems(stream, n, k): i=0; # index for elements # in stream[]
# reservoir[] is the output
# array. Initialize it with
# first k elements from stream[]
reservoir = [0]*k;
for i in range(k):
reservoir[i] = stream[i];
# Iterate from the (k+1)th
# element to nth element
while(i < n):
# Pick a random index
# from 0 to i.
j = random.randrange(i+1);
# If the randomly picked
# index is smaller than k,
# then replace the element
# present at the index
# with new element from stream
if(j < k):
reservoir[j] = stream[i];
i+=1;
print("Following are k randomly selected items");
printArray(reservoir, k);Driver Code
if name == "main": stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]; n = len(stream); k = 5; selectKItems(stream, n, k);
This code is contributed by mits
C#
// An efficient C# program to randomly // select k items from a stream of items using System; using System.Collections;
public class ReservoirSampling { // A function to randomly select k // items from stream[0..n-1]. static void selectKItems(int []stream, int n, int k) { // index for elements in stream[] int i;
// reservoir[] is the output array.
// Initialize it with first k
// elements from stream[]
int[] reservoir = new int[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
Random r = new Random();
// Iterate from the (k+1)th
// element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = r.Next(i + 1);
// If the randomly picked index
// is smaller than k, then replace
// the element present at the index
// with new element from stream
if(j < k)
reservoir[j] = stream[i];
}
Console.WriteLine("Following are k " +
"randomly selected items");
for (i = 0; i < k; i++)
Console.Write(reservoir[i]+" ");
}
//Driver code
static void Main()
{
int []stream = {1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12};
int n = stream.Length;
int k = 5;
selectKItems(stream, n, k);
}}
// This code is contributed by mits
JavaScript
PHP
`
**Output:
Following are k randomly selected items
6 2 11 8 12
Note: Output will differ every time as it selects and prints random elements
**Time Complexity: O(n)
**Auxiliary Space: _O(k)
**How does this work?
To prove that this solution works perfectly, we must prove that the probability that any item _stream[i] where _0 <= i < n will be in final _reservoir[] is _k/n. Let us divide the proof in two cases as first k items are treated differently.
**Case 1: For last **n-k stream items, i.e., for **stream[i] where **k <= i < n
For every such stream item _stream[i], we pick a random index from 0 to _i and if the picked index is one of the first _k indexes, we replace the element at picked index with _stream[i]
To simplify the proof, let us first consider the _last item. The probability that the last item is in final reservoir = The probability that one of the first _k indexes is picked for last item = _k/n (the probability of picking one of the _k items from a list of size n)
Let us now consider the _second last item. The probability that the second last item is in final _reservoir[] = [Probability that one of the first _k indexes is picked in iteration for __stream[n-2]_] X [Probability that the index picked in iteration for _stream[n-1] is not same as index picked for _stream[n-2] ] = [__k/(n-1)]*[(n-1)/n_] = _k/n.
Similarly, we can consider other items for all stream items from _stream[n-1] to _stream[k] and generalize the proof.
**Case 2: For first **k stream items, i.e., for **stream[i] where 0 <= i < k
The first _k items are initially copied to _reservoir[] and may be removed later in iterations for _stream[k] to _stream[n].
The probability that an item from _stream[0..k-1] is in final array = Probability that the item is not picked when items _stream[k], stream[k+1], .... stream[n-1] are considered = _[k/(k+1)] x [(k+1)/(k+2)] x [(k+2)/(k+3)] x ... x [(n-1)/n] = k/n