How to drop one or multiple columns in Pandas DataFrame (original) (raw)

Last Updated : 15 Nov, 2024

Let’s learn how to drop one or more columns in Pandas DataFrame for data manipulation.

Drop Columns Using df.drop() Method

Let’s consider an example of the dataset (data) with three columns ‘A’, ‘B’, and ‘C’. Now, to drop a single column, use the **drop() method with the column’s name.

Python `

import pandas as pd

Sample DataFrame

data = pd.DataFrame({ 'A': ['A1', 'A2', 'A3', 'A4', 'A5'], 'B': ['B1', 'B2', 'B3', 'B4', 'B5'], 'C': ['C1', 'C2', 'C3', 'C4', 'C5'] })

Syntax: df = df.drop('ColumnName', axis=1)

Drop column 'B'

df = data.drop('B', axis=1) print(df)

`

**Output:

A   C  

0 A1 C1
1 A2 C2
2 A3 C3
3 A4 C4
4 A5 C5

The **df variable stores a new Pandas DataFrame with ‘A’ and ‘C’ columns. We can also **drop multiple columns by providing a list of column names.

Python `

Syntax: df.drop(['Column1', 'Columns2'], axis=1)

Drop columns 'B' and 'C'

df = data.drop(['B', 'C'], axis=1) print(df)

`

**Output:

A  

0 A1
1 A2
2 A3
3 A4
4 A5

The df.drop() method is just one way to remove columns in pandas. There are several other techniques available, each suited for different scenarios. Let’s explore these alternative methods and see how they can be applied using the same dataset throughout.

Table of Content

Drop Columns Using Column Index

When you know the index positions of the columns you want to delete, you can drop them by specifying their indices. This is useful for automated processes where column positions are known but names may vary.

Python `

Drop columns based on index positions

df = data.drop(data.columns[[0, 2]], axis=1) print(df)

`

**Output:

B  

0 B1
1 B2
2 B3
3 B4
4 B5

Drop Columns Using df.iloc[] for Index-Based Ranges

The**iloc[] method can delete columns based on their positional index range. Use it when you need to delete a range of columns by position.

Python `

Drop columns from index 1 to 2

df = data.drop(data.iloc[:, 1:3], axis=1) print(df)

`

**Output:

A  

0 A1
1 A2
2 A3
3 A4
4 A5

This method is particularly helpful when working with large datasets, allowing you to select columns by index range instead of individually naming them.

Drop Columns Using df.loc[] for Label-Based Ranges

If you prefer to drop columns between specific column names, use **loc[] with the **drop()**method. It’s useful when column names are ordered, and you want to remove all columns between two labels.

Python `

Drop columns from 'B' to 'C'

df = data.drop(columns=data.loc[:, 'B':'C'].columns) print(df)

`

**Output:

A  

0 A1
1 A2
2 A3
3 A4
4 A5

This method provides flexibility for label-based indexing, allowing you to use descriptive names instead of numerical positions.

Drop Columns Using pop() Method

The **pop() method removes a specified column and returns it as a Series, making it useful if you want to extract and use a column’s data while also removing it from the DataFrame.

Python `

Drop column 'B' and store it as a Series

popped_column = data.pop('B') print(df)

print('Popped Column') print(popped_column)

`

**Output:

A   C  

0 A1 C1
1 A2 C2
2 A3 C3
3 A4 C4
4 A5 C5

Popped Column
0 B1
1 B2
2 B3
3 B4
4 B5
Name: B, dtype: object

Drop Columns Based on a Condition

In cases where you want to drop columns dynamically based on conditions, like a threshold for missing values, use conditional deletion with the **dropna() method.

For example, you can drop columns with more than 50% missing values:

Python `

Sample DataFrame with missing values

df = pd.DataFrame({ 'A': [1, 2, None, 4], 'B': [None, None, None, 4], 'C': [1, 2, 3, 4] })

Drop columns with more than 50% missing values

threshold = len(df) * 0.5 df = df.dropna(thresh=threshold, axis=1) print(df)

`

**Output:

 A  C  

0 1.0 1
1 2.0 2
2 NaN 3
3 4.0 4