Working with Missing Data in Pandas (original) (raw)

Last Updated : 02 Jun, 2025

In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:

**None: A Python object used to represent missing values in object-type arrays.
**NaN: A special floating-point value from NumPy which is recognized by all systems that use IEEE floating-point standards.

In this article we see how to detect, handle and fill missing values in a DataFrame to keep the data clean and ready for analysis.

Checking Missing Values in Pandas

Pandas provides two important functions which help in detecting whether a value is NaN helpful in making data cleaning and preprocessing easier in a DataFrame or Series are given below :

**1. Using isnull()

**isnull() returns a DataFrame of Boolean value where True represents missing data (NaN). This is simple if we want to find and fill missing data in a dataset.

**Example 1: Finding Missing Values in a DataFrame

We will be using Numpy and Pandas libraries for this implementation****.**

Python `

import pandas as pd import numpy as np

d = {'First Score': [100, 90, np.nan, 95], 'Second Score': [30, 45, 56, np.nan], 'Third Score': [np.nan, 40, 80, 98]} df = pd.DataFrame(d)

mv = df.isnull()

print(mv)

**Output

**Example 2: Filtering Data Based on Missing Values

Here we used random Employee dataset, you can download the csv file from here. The isnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

Python `

import pandas as pd d = pd.read_csv("/content/employees.csv")

bool_series = pd.isnull(d["Gender"]) missing_gender_data = d[bool_series] print(missing_gender_data)

**Output

**2. Checking for Non-Missing Values Using notnull()

**notnull() function returns a DataFrame with Boolean values where True indicates non-missing (valid) data. This function is useful when we want to focus only on the rows that have valid, non-missing values.

**Example 1: Identifying Non-Missing Values in a DataFrame

Python `

import pandas as pd import numpy as np

d = {'First Score': [100, 90, np.nan, 95], 'Second Score': [30, 45, 56, np.nan], 'Third Score': [np.nan, 40, 80, 98]} df = pd.DataFrame(d)

nmv = df.notnull()

print(nmv)

**Output

**Example 2: Filtering Data with Non-Missing Values

notnull()function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

Python `

import pandas as pd d = pd.read_csv("/content/employees.csv")

nmg = pd.notnull(d["Gender"])

nmgd= d[nmg]

display(nmgd)

**Output

Filling Missing Values in Pandas

Following functions allow us to replace missing values with a specified value or use interpolation methods to find the missing data.

1. Using fillna()

**fillna() used to replace missing values (NaN) with a given value. Lets see various example for this.

**Example 1: Fill Missing Values with Zero

Python `

import pandas as pd import numpy as np

d = {'First Score': [100, 90, np.nan, 95], 'Second Score': [30, 45, 56, np.nan], 'Third Score': [np.nan, 40, 80, 98]} df = pd.DataFrame(d)

df.fillna(0)

**Output

**Example 2: Fill with Previous Value (Forward Fill)

The pad method is used to fill missing values with the previous value.

Python `

df.fillna(method='pad')

**Output

**Example 3: Fill with Next Value (Backward Fill)

The bfill function is usedto fill it with the next value.

Python `

df.fillna(method='bfill')

**Output

**Example 4: Fill NaN Values with 'No Gender'

Python `

import pandas as pd import numpy as np d = pd.read_csv("/content/employees.csv")

d[10:25]

**Output

Now we are going to fill all the null values in Gender column with "No Gender"

Python `

d["Gender"].fillna('No Gender', inplace = True) d[10:25]

**Output

2. Using replace()

Use**replace()function to replace NaN values with a specific value.

**Example

Python `

import pandas as pd import numpy as np

data = pd.read_csv("/content/employees.csv") data[10:25]

**Output

Now, we are going to replace the all NaN value in the data frame with -99 value.

Python `

data.replace(to_replace=np.nan, value=-99)

**Output

3. Using interpolate()

The**interpolate()function fills missing values using interpolation techniques such as the linear method.

**Example

Python `

import pandas as pd

df = pd.DataFrame({"A": [12, 4, 5, None, 1], "B": [None, 2, 54, 3, None], "C": [20, 16, None, 3, 8], "D": [14, 3, None, None, 6]})
print(df)

**Output

Let’s interpolate the missing values using Linear method. This method ignore the index and consider the values as equally spaced.

Python `

df.interpolate(method ='linear', limit_direction ='forward')

**Output

Dropping Missing Values in Pandas

The **dropna() function used to removes rows or columns with NaN values. It can be used to drop data based on different conditions.

1. Dropping Rows with At Least One Null Value

Remove rows that contain at least one missing value.

**Example

Python `

import pandas as pd import numpy as np

dict = {'First Score': [100, 90, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score': [52, 40, 80, 98], 'Fourth Score': [np.nan, np.nan, np.nan, 65]} df = pd.DataFrame(dict)

df.dropna()

**Output

**2. Dropping Rows with All Null Values

We can drop rows where all values are missing using dropna(how='all').

**Example

Python `

dict = {'First Score': [100, np.nan, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score': [52, np.nan, 80, 98], 'Fourth Score': [np.nan, np.nan, np.nan, 65]} df = pd.DataFrame(dict)

df.dropna(how='all')

**Output

3. Dropping Columns with At Least One Null Value

To remove columns that contain at least one missing value we use dropna(axis=1).

**Example

Python `

dict = {'First Score': [100, np.nan, np.nan, 95], 'Second Score': [30, np.nan, 45, 56], 'Third Score': [52, np.nan, 80, 98], 'Fourth Score': [60, 67, 68, 65]} df = pd.DataFrame(dict)

df.dropna(axis=1)

**Output

4. Dropping Rows with Missing Values in CSV Files

When working with CSV files, we can drop rows with missing values using dropna().

**Example

Python `

import pandas as pd d = pd.read_csv("/content/employees.csv")

nd = d.dropna(axis=0, how='any')

print("Old data frame length:", len(d)) print("New data frame length:", len(nd)) print("Rows with at least one missing value:", (len(d) - len(nd)))

**Output:

MISSING1

Drop Rows with NaN

Since the difference is 236, there were 236 rows which had at least 1 Null value in any column. By using these functions we can easily detect, handle and fill missing values.