How to Randomly Select rows from Pandas DataFrame (original) (raw)

Last Updated : 15 Apr, 2025

In Pandas, it is possible to select rows randomly from a DataFrame with different methods. Randomly selecting rows can be useful for tasks like sampling, testing or data exploration.

Creating Sample Pandas DataFrame

First, we will create a sample Pandas DataFrame that we will use further in our article.

Python `

import pandas as pd

d = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'], 'Age':[27, 24, 22, 32, 15], 'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}

df = pd.DataFrame(d)

**Output

Dataframe

Sample Dataframe

Let’s discuss how to randomly select rows from Pandas DataFrame. A random selection of rows from a DataFrame can be achieved in different ways. Below are the ways by which we can randomly select rows from Pandas DataFrame:

Using sample() Method
Using parameter n
Using frac parameter
Using replace = false
Using weights
Using axis
Using random_state
Using NumPy

**1. Using sample() method

In this example, we are using **sample() method to randomly select rows from Pandas DataFram. Sample method returns a random sample of items from an axis of object and this object of same type as our caller.

Python `

Select one random row

dfs = dfs.sample() print(dfs)

**Output

df_using_sample

df using sample()

2. Using parameter n

We can specify the number of rows to select using the n parameter. Every time we run this, we’ll get different rows.

Python `

Select 3 random rows

df.sample(n=3)

**Output

df_using_random

df using random

3. Using frac Parameter

One can do fraction of axis items and get rows. **For example, if **frac= .5 then sample method return 50% of rows.

Python `

df.sample(frac=0.5) # here you get .50 % of the rows

**Output

using_df_50

using frac 50% df

4. Selecting Rows with Replacement (`replace=False`)

By default, the sample() method doesn’t allow selecting the same row more than once. However, we can allow this by setting replace=True .

Python `

df.sample(n=5, replace=True)

**Output

using_df_replace

df using replace

5. Using Weights to Select Rows

We can assign weights to rows so that some rows are more likely to be selected than others. The weights parameter controls the probability of selecting each row.

Python `

test_weights = [0.2, 0.4, 0.2, 0.2, 0.4]

df.sample(n=3, weights=test_weights)

**Output

Screenshot-2025-04-10-121444

df using weight

6. Using `axis` Parameter for Column Sampling

The axis accepts number or name. sample() method also allows users to sample columns instead of rows using the axis argument.

Python `

Sample columns instead of rows

df1.sample(axis=0)

**Output

df_using_sample

df using column sampling

7. Using `random_state` for Reproducibility

With a given DataFrame, the sample will always fetch same rows. If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

Python `

df.sample(n=2, random_state=2)

**Output

Screenshot-2025-04-10-121905

df using random state

**8. Using NumPy for Random Selection

We can also use **NumPy to randomly select rows based on their index. This approach allows us to control the number of rows to select and whether or not to allow replacement.

Python `

import numpy as np

indices = np.random.choice(df.index, size=4, replace=False) df.loc[indices]

**Output

df_using_numpy

df using numpy

**Related Article:

Pandas DataFrame

NumPy Introduction

Randomly Select Columns from Pandas DataFrame

How to Randomly Select rows from Pandas DataFrame (original) (raw)

Creating Sample Pandas DataFrame

**1. Using sample() method

Select one random row

2. Using parameter n

Select 3 random rows

3. Using frac Parameter

4. Selecting Rows with Replacement (replace=False)

5. Using Weights to Select Rows

6. Using axis Parameter for Column Sampling

Sample columns instead of rows

7. Using random_state for Reproducibility

**8. Using NumPy for Random Selection

4. Selecting Rows with Replacement (`replace=False`)

6. Using `axis` Parameter for Column Sampling

7. Using `random_state` for Reproducibility