Filter Pandas Dataframe by Column Value (original) (raw)
Last Updated : 15 Jul, 2025
Filtering a Pandas DataFrame by column values is a common and essential task in data analysis. It allows to **extract specific rows based on conditions applied to one or more columns, making it easier to work with relevant subsets of data. Let's start with a quick example to illustrate the concept:
Python `
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 32,45], 'Score': [85, 90, 78]} df = pd.DataFrame(data)
Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30] display(filtered_df)
`
**Output:
Filter Pandas Dataframe by Column Value
In this example, we filtered the DataFrame to show only rows where the "Age" column has values greater than 30. The result is a smaller DataFrame containing only the rows that meet this condition. This method is called is Boolean indexing as it create a boolean mask by applying conditions to the DataFrame and then use this mask to select rows. This method is powerful because it allows you to combine multiple conditions using logical operators and, or, and not.
Considering another example, imagine you have a DataFrame containing information about employees, and you want to filter out only those from the "Marketing" department.
Python `
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Department': ['Marketing', 'Finance', 'Marketing']} df = pd.DataFrame(data)
Filter rows where Department is 'Marketing'
filtered_df = df[df['Department'] == 'Marketing'] display(filtered_df)
`
**Output:
Filter Pandas Dataframe by Column Value
In this example, we simply **use df[columnname] == value to filter rows, and wrap it in df[...] to create a new filtered DataFrame.
1. Using the loc[] Accessor
The loc[] accessor is another common method for filtering. The .loc[] method allows for more complex filtering, used to filter both rows and columns at the same time by specifying conditions for both axes. It allows to specify conditions directly within the square brackets.
Python `
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 32,45], 'Score': [85, 90, 78]} df = pd.DataFrame(data)
Filter rows where Age > 30 and select only 'Name' and 'Score' columns
filtered_df = df.loc[df['Age'] > 30, ['Name', 'Score']] print(filtered_df)
`
**Output:
Name Score 1 Bob 90
2 Charlie 78
Here, we filter rows where "Age" is greater than 30 and select only the "Name" and "Score" columns from those filtered rows.
2. Using .isin()
The .isin() method is useful when you want to filter rows based on whether a column's value exists in a list of values.
Python `
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 32,45], 'Score': [85, 90, 78]} df = pd.DataFrame(data)
Filter rows where Age is either 25 or 45
filtered_df = df[df['Age'].isin([25, 45])] print(filtered_df)
`
**Output:
Name Age Score 0 Alice 25 85
2 Charlie 45 78
This code filters the DataFrame to include only rows where the "Age" column has values of either 25 or 45.
3. Using .query()
The .query() method allows you to filter a DataFrame using SQL-like syntax. This can be particularly useful when dealing with complex conditions.
Python `
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 32,45], 'Score': [85, 90, 78]} df = pd.DataFrame(data)
Filter using query method where Age > 30 and Score < 90
filtered_df = df.query('Age > 30 and Score < 90') print(filtered_df)
`
**Output:
Name Age Score 2 Charlie 45 78
In this example, we use .query() to filter rows where "Age" is greater than 30 and "Score" is less than 90.
Chaining Multiple Conditions With Boolean Indexing
You can combine multiple conditions using logical operators:
- **AND: Both conditions must be true for a row to be included.
- **OR: At least one condition must be true for a row to be included.
- **NOT: The condition must be false for a row to be included.
These operators are applied element-wise to DataFrame columns, and the results are then used to index the DataFrame.
Python `
import pandas as pd data = {'Name': ['Bob', 'Charlie', 'David'], 'Age': [23, 45, 28], 'Score': [90, 78, 88]} df = pd.DataFrame(data)
AND operation: Age > 25 AND Score > 80
and_filter = df[(df['Age'] > 25) & (df['Score'] > 80)] print("AND Operation Result:") print(and_filter)
OR operation: Age > 25 OR Score > 80
or_filter = df[(df['Age'] > 25) | (df['Score'] > 80)] print("\nOR Operation Result:") print(or_filter)
NOT operation: NOT (Age > 25)
not_filter = df[~(df['Age'] > 25)] print("\nNOT Operation Result:") print(not_filter)
`
**Output:
**AND Operation Result:
Name Age Score
2 David 28 88
**OR Operation Result:
Name Age Score
0 Bob 23 90
1 Charlie 45 78
2 David 28 88
**NOT Operation Result:
Name Age Score
0 Bob 23 90
This code filters the DataFrame to include only rows where both conditions are met: "Age" greater than 25 and "Score" greater than 80 for "and operation": , include rows where **at least one of the conditions is met: either "Age" greater than 25 or "Score" greater than 80 for "OR operation" and filters the DataFrame to include only rows where "Age" is **not greater than 25 for NOT operation.
When to Use Each Method for Filtering a Pandas DataFrame
Filtering a Pandas DataFrame by column value is a crucial skill in data analysis, and here are the key takeaways along with guidance on when to use each method:
| Method | When to Use |
|---|---|
| **Boolean Indexing | Ideal for simple conditions (e.g., df[column] > value) filtering rows based on conditions applied to individual columns. |
| ****.loc[] Accessor** | When you need to filter both rows and columns simultaneously. Apply conditions to rows and select specific columns. |
| ****.isin()** | Best when checking if a column's value is in a list of specific values. Use when filtering rows based on membership in a list, series, or array. |
| ****.query()** | Ideal for complex conditions written in a SQL-like syntax. |
| **Logical Operators (AND, OR, NOT) | Use when combining multiple conditions to filter data. |