Using dictionary to remap values in Pandas DataFrame columns (original) (raw)
Last Updated : 22 Mar, 2025
While working with data in Pandas, we often need to modify or transform values in specific columns. One common transformation is remapping values using a dictionary. This technique is useful when we need to replace categorical values with labels, abbreviations or numerical representations. In this article, we’ll explore different ways to remap values in a Pandas DataFrame using dictionary mapping.
Creating a sample pandas dataframe
Let’s start by creating a sample DataFrame that contains event details.
Python `
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'], 'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'], 'Cost': [10000, 5000, 15000, 2000]})
print(df)
`
Output
Date Event Cost
0 10/2/2011 Music 10000 1 11/2/2011 Poetry 5000 2 12/2/2011 Theatre 15000 3 13/2/2011 Comedy 2000
Using replace() function
The replace() function in Pandas allows us to remap values using a dictionary. It works directly on a DataFrame column and modifies the values based on the provided mapping.
Python `
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'], 'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'], 'Cost': [10000, 5000, 15000, 2000]})
Define a dictionary for remapping values
d = {'Music': 'M', 'Poetry': 'P', 'Theatre': 'T', 'Comedy': 'C'}
Remap the values using replace()
df['Event'] = df['Event'].replace(d)
print(df)
`
Output
Date Event Cost
0 10/2/2011 M 10000 1 11/2/2011 P 5000 2 12/2/2011 T 15000 3 13/2/2011 C 2000
**Explanation:
- **replace() searches for values in the column and replaces them based on the dictionary.
- This method works on DataFrame columns and can handle multiple column replacements.
Using map() function
Another way to remap values in a Pandas column is by using the map() function.
Python `
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'], 'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'], 'Cost': [10000, 5000, 15000, 2000]})
Define a dictionary for remapping values
d = {'Music': 'M', 'Poetry': 'P', 'Theatre': 'T', 'Comedy': 'C'}
Remap the values using map()
df['Event'] = df['Event'].map(d) print(df)
`
Output
Date Event Cost
0 10/2/2011 M 10000 1 11/2/2011 P 5000 2 12/2/2011 T 15000 3 13/2/2011 C 2000
**Explanation:
- map() applies the dictionary mapping to each element in the column.
- Unlike replace(), map() only works on Series (single column).
Differences between replace() and map()
Feature | replace() | map() |
---|---|---|
Works on entire DataFrame? | Yes | No, only on Series (columns) |
Supports multiple column replacements? | Yes | No, works on a single column |
Handles missing keys? | Yes | No, returns NaN for missing keys |
Handling missing values during mapping
If the dictionary used in map() does not contain a key for some values in the column, those values will be replaced with NaN. To handle this, we can use the fillna() function.
Python `
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'], 'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'], 'Cost': [10000, 5000, 15000, 2000]})
Define an incomplete dictionary
d = {'Music': 'M', 'Poetry': 'P'}
Apply map() and handle missing values
df['Event'] = df['Event'].map(d).fillna('Unknown')
print(df)
`
Output
Date Event Cost
0 10/2/2011 M 10000 1 11/2/2011 P 5000 2 12/2/2011 Unknown 15000 3 13/2/2011 Unknown 2000
**Explanation:
- **map() function replaces only the values present in the dictionary.
- Missing values (i.e., Theatre and Comedy) become **NaN.
- **fillna(‘Unknown’) ensures that missing values are replaced with “Unknown” instead of NaN.