Reservoir Sampling (original) (raw)

Last Updated : 23 Jul, 2025

Reservoir sampling is a family of randomized algorithms for randomly choosing _k samples from a list of _n items, where _n is either a very large or unknown number. Typically _n is large enough that the list doesn't fit into main memory. For example, a list of search queries in Google and Facebook.
So we are given a big array (or stream) of numbers (to simplify), and we need to write an efficient function to randomly select _k numbers where _1 <= k <= n. Let the input array be _stream[].

A **simple solution is to create an array _reservoir[] of maximum size _k. One by one randomly select an item from _stream[0..n-1]. If the selected item is not previously selected, then put it in _reservoir[]. To check if an item is previously selected or not, we need to search the item in _reservoir[]. The time complexity of this algorithm will be _O(k^2). This can be costly if _k is big. Also, this is not efficient if the input is in the form of a stream.

It **can be solved in **O(n) time. The solution also suits well for input in the form of stream. The idea is similar to this post. Following are the steps.
**1) Create an array _reservoir[0..k-1] and copy first _k items of _stream[] to it.
**2) Now one by one consider all items from __(k+1)_th item to __n_th item.
...**a) Generate a random number from 0 to _i where _i is the index of the current item in _stream[]. Let the generated random number is _j.
...**b) If j is in range 0 to _k-1, replace reservoir[j] with stream__[i]

Following is the implementation of the above algorithm.

C++ `

// An efficient program to randomly select // k items from a stream of items #include <bits/stdc++.h> #include <time.h> using namespace std;

// A utility function to print an array void printArray(int stream[], int n) { for (int i = 0; i < n; i++) cout << stream[i] << " "; cout << endl; }

// A function to randomly select // k items from stream[0..n-1]. void selectKItems(int stream[], int n, int k) { int i; // index for elements in stream[]

// reservoir[] is the output array. Initialize 
// it with first k elements from stream[] 
int reservoir[k]; 
for (i = 0; i < k; i++) 
    reservoir[i] = stream[i]; 

// Use a different seed value so that we don't get 
// same result each time we run this program 
srand(time(NULL)); 

// Iterate from the (k+1)th element to nth element 
for (; i < n; i++) 
{ 
    // Pick a random index from 0 to i. 
    int j = rand() % (i + 1); 

    // If the randomly picked index is smaller than k, 
    // then replace the element present at the index 
    // with new element from stream 
    if (j < k) 
    reservoir[j] = stream[i]; 
} 

cout << "Following are k randomly selected items \n"; 
printArray(reservoir, k); 

}

// Driver Code int main() { int stream[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; int n = sizeof(stream)/sizeof(stream[0]); int k = 5; selectKItems(stream, n, k); return 0; }

// This is code is contributed by rathbhupendra

C

// An efficient program to randomly select k items from a stream of items

#include <stdio.h> #include <stdlib.h> #include <time.h>

// A utility function to print an array void printArray(int stream[], int n) { for (int i = 0; i < n; i++) printf("%d ", stream[i]); printf("\n"); }

// A function to randomly select k items from stream[0..n-1]. void selectKItems(int stream[], int n, int k) { int i; // index for elements in stream[]

// reservoir[] is the output array. Initialize it with
// first k elements from stream[]
int reservoir[k];
for (i = 0; i < k; i++)
    reservoir[i] = stream[i];

// Use a different seed value so that we don't get
// same result each time we run this program
srand(time(NULL));

// Iterate from the (k+1)th element to nth element
for (; i < n; i++)
{
    // Pick a random index from 0 to i.
    int j = rand() % (i+1);

    // If the randomly  picked index is smaller than k, then replace
    // the element present at the index with new element from stream
    if (j < k)
      reservoir[j] = stream[i];
}

printf("Following are k randomly selected items \n");
printArray(reservoir, k);

}

// Driver program to test above function. int main() { int stream[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}; int n = sizeof(stream)/sizeof(stream[0]); int k = 5; selectKItems(stream, n, k); return 0; }

Java

// An efficient Java program to randomly // select k items from a stream of items import java.util.Arrays; import java.util.Random; public class ReservoirSampling {

// A function to randomly select k items from
// stream[0..n-1].
static void selectKItems(int stream[], int n, int k)
{
    int i; // index for elements in stream[]

    // reservoir[] is the output array. Initialize it
    // with first k elements from stream[]
    int reservoir[] = new int[k];
    for (i = 0; i < k; i++)
        reservoir[i] = stream[i];

    Random r = new Random();

    // Iterate from the (k+1)th element to nth element
    for (; i < n; i++) {
        // Pick a random index from 0 to i.
        int j = r.nextInt(i + 1);

        // If the randomly  picked index is smaller than
        // k, then replace the element present at the
        // index with new element from stream
        if (j < k)
            reservoir[j] = stream[i];
    }

    System.out.println(
        "Following are k randomly selected items");
    System.out.println(Arrays.toString(reservoir));
}

// Driver Program to test above method
public static void main(String[] args)
{
    int stream[]
        = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
    int n = stream.length;
    int k = 5;
    selectKItems(stream, n, k);
}

} // This code is contributed by Sumit Ghosh

Python3

An efficient Python3 program

to randomly select k items

from a stream of items

import random

A utility function

to print an array

def printArray(stream,n): for i in range(n): print(stream[i],end=" "); print();

A function to randomly select

k items from stream[0..n-1].

def selectKItems(stream, n, k): i=0; # index for elements # in stream[]

    # reservoir[] is the output 
    # array. Initialize it with
    # first k elements from stream[]
    reservoir = [0]*k;
    for i in range(k):
        reservoir[i] = stream[i];
    
    # Iterate from the (k+1)th
    # element to nth element
    while(i < n):
        # Pick a random index
        # from 0 to i.
        j = random.randrange(i+1);
        
        # If the randomly picked
        # index is smaller than k,
        # then replace the element
        # present at the index
        # with new element from stream
        if(j < k):
            reservoir[j] = stream[i];
        i+=1;
    
    print("Following are k randomly selected items");
    printArray(reservoir, k);

Driver Code

if name == "main": stream = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]; n = len(stream); k = 5; selectKItems(stream, n, k);

This code is contributed by mits

C#

// An efficient C# program to randomly // select k items from a stream of items using System; using System.Collections;

public class ReservoirSampling { // A function to randomly select k // items from stream[0..n-1]. static void selectKItems(int []stream, int n, int k) { // index for elements in stream[] int i;

    // reservoir[] is the output array. 
    // Initialize it with first k
    //  elements from stream[]
    int[] reservoir = new int[k];
    for (i = 0; i < k; i++)
        reservoir[i] = stream[i];
    
    Random r = new Random();
    
    // Iterate from the (k+1)th 
    // element to nth element
    for (; i < n; i++)
    {
        // Pick a random index from 0 to i.
        int j = r.Next(i + 1);
        
        // If the randomly picked index 
        // is smaller than k, then replace 
        // the element present at the index
        // with new element from stream
        if(j < k)
            reservoir[j] = stream[i];         
    }
    
    Console.WriteLine("Following are k " +
                "randomly selected items");
    for (i = 0; i < k; i++)
    Console.Write(reservoir[i]+" ");
}

//Driver code
static void Main()
{
    int []stream = {1, 2, 3, 4, 5, 6, 7,
                    8, 9, 10, 11, 12};
    int n = stream.Length;
    int k = 5;
    selectKItems(stream, n, k);
}

}

// This code is contributed by mits

JavaScript

PHP

i<i < i<n; $i++) echo stream[stream[stream[i]." "; echo "\n"; } // A function to randomly select // k items from stream[0..n-1]. function selectKItems($stream, n,n, n,k) { $i; // index for elements // in stream[] // reservoir[] is the output // array. Initialize it with // first k elements from stream[] reservoir=arrayfill(0,reservoir = array_fill(0, reservoir=arrayfill(0,k, 0); for ($i = 0; i<i < i<k; $i++) reservoir[reservoir[reservoir[i] = stream[stream[stream[i]; // Iterate from the (k+1)th // element to nth element for (; i<i < i<n; $i++) { // Pick a random index // from 0 to i. j=rand(0,j = rand(0,j=rand(0,i + 1); // If the randomly picked // index is smaller than k, // then replace the element // present at the index // with new element from stream if($j < $k) reservoir[reservoir[reservoir[j] = stream[stream[stream[i]; } echo "Following are k randomly ". "selected items\n"; printArray($reservoir, $k); } // Driver Code $stream = array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12); n=count(n = count(n=count(stream); $k = 5; selectKItems($stream, n,n, n,k); // This code is contributed by mits ?>

`

**Output:

Following are k randomly selected items
6 2 11 8 12
Note: Output will differ every time as it selects and prints random elements

**Time Complexity: O(n)

**Auxiliary Space: _O(k)

**How does this work?
To prove that this solution works perfectly, we must prove that the probability that any item _stream[i] where _0 <= i < n will be in final _reservoir[] is _k/n. Let us divide the proof in two cases as first k items are treated differently.

**Case 1: For last **n-k stream items, i.e., for **stream[i] where **k <= i < n
For every such stream item _stream[i], we pick a random index from 0 to _i and if the picked index is one of the first _k indexes, we replace the element at picked index with _stream[i]
To simplify the proof, let us first consider the _last item. The probability that the last item is in final reservoir = The probability that one of the first _k indexes is picked for last item = _k/n (the probability of picking one of the _k items from a list of size n)
Let us now consider the _second last item. The probability that the second last item is in final _reservoir[] = [Probability that one of the first _k indexes is picked in iteration for __stream[n-2]_] X [Probability that the index picked in iteration for _stream[n-1] is not same as index picked for _stream[n-2] ] = [__k/(n-1)]*[(n-1)/n_] = _k/n.
Similarly, we can consider other items for all stream items from _stream[n-1] to _stream[k] and generalize the proof.

**Case 2: For first **k stream items, i.e., for **stream[i] where 0 <= i < k
The first _k items are initially copied to _reservoir[] and may be removed later in iterations for _stream[k] to _stream[n].
The probability that an item from _stream[0..k-1] is in final array = Probability that the item is not picked when items _stream[k], stream[k+1], .... stream[n-1] are considered = _[k/(k+1)] x [(k+1)/(k+2)] x [(k+2)/(k+3)] x ... x [(n-1)/n] = k/n

References:
https://en.wikipedia.org/wiki/Reservoir_sampling