Convert column type from string to datetime format in Pandas dataframe (original) (raw)
Last Updated : 27 Nov, 2024
To perform time-series operations, dates should be in the correct format. Let’s learn how to convert a Pandas DataFrame column of strings to datetime format.
Pandas Convert Column To DateTime using pd.to_datetime()
**pd.to_datetime() function in Pandas is the most effective way to handle this conversion. This is used to handle different formats and to convert string columns to datetime objects.
Python `
Importing pandas
import pandas as pd
Creating a DataFrame
df = pd.DataFrame({ 'Date': ['11/8/2011', '04/23/2008', '10/2/2019'], 'Event': ['Music', 'Poetry', 'Theatre'], 'Cost': [10000, 5000, 15000] })
Display initial data types
print("Before Conversion:") print(df.info())
Converting 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])
Display data types after conversion
print("\nAfter Conversion:") print(df.info())
`
**Output:
**Before Conversion:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
Column Non-Null Count Dtype
0 Date 3 non-null object
1 Event 3 non-null object
2 Cost 3 non-null int64
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
None
**After Conversion:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
Column Non-Null Count Dtype
0 Date 3 non-null datetime64[ns]
1 Event 3 non-null object
2 Cost 3 non-null int64
dtypes: datetime64ns, int64(1), object(1)
memory usage: 200.0+ bytes
None
Before conversion, the Date column has the data type **object (string). After using **pd.to_datetime(), it changes to **datetime64[ns]. Now, let’ s explore how can we can convert columns from string-based formats like **yymmdd to **yyyymmdd using Pandas.
Converting from ‘yymmdd’ Format
If a column contains dates in the **yymmdd format, you can use **pd.to_datetime() with the appropriate format to convert it to **datetime64[ns].
Python `
Importing pandas library
import pandas as pd
Initializing the dataset
player_list = [ ['200712', 50000], ['200714', 51000], ['200716', 51500], ['200719', 53000], ['200721', 54000], ['200724', 55000], ['200729', 57000] ]
Creating a pandas DataFrame
df = pd.DataFrame(player_list, columns=['Dates', 'Patients'])
Displaying the DataFrame and its data types before conversion
print("Before Conversion:") print(df) print(df.dtypes)
Converting 'Dates' from 'yymmdd' to datetime format
df['Dates'] = pd.to_datetime(df['Dates'], format='%y%m%d')
Displaying the DataFrame and its data types after conversion
print("\nAfter Conversion:") print(df) print(df.dtypes)
`
**Output:
**Before Conversion:
Dates Patients
0 200712 50000
1 200714 51000
2 200716 51500
3 200719 53000
4 200721 54000
5 200724 55000
6 200729 57000
**Dates object
Patients int64
dtype: object
**After Conversion:
Dates Patients
0 2020-07-12 50000
1 2020-07-14 51000
2 2020-07-16 51500
3 2020-07-19 53000
4 2020-07-21 54000
5 2020-07-24 55000
6 2020-07-29 57000
**Dates datetime64[ns]
Patients int64
dtype: object
Converting from ‘yyyymmdd’ Format
When the column is already in yyyymmdd format, use **pd.to_datetime() with the ****%Y%m%d** format to convert it directly to **datetime64[ns].
Python `
import pandas as pd
Initializing the dataset
player_list = [ ['20200712', 50000, '20200812'], ['20200714', 51000, '20200814'], ['20200716', 51500, '20200816'], ['20200719', 53000, '20200819'], ['20200721', 54000, '20200821'], ['20200724', 55000, '20200824'], ['20200729', 57000, '20200824'] ]
Creating a pandas DataFrame
df = pd.DataFrame(player_list, columns=['Treatment_start', 'No_of_Patients', 'Treatment_end'])
Displaying the DataFrame and its data types before conversion
print("Before Conversion:") print(df) print(df.dtypes)
Converting 'Treatment_start' and 'Treatment_end' to datetime format
df['Treatment_start'] = pd.to_datetime(df['Treatment_start'], format='%Y%m%d') df['Treatment_end'] = pd.to_datetime(df['Treatment_end'], format='%Y%m%d')
Displaying the DataFrame and its data types after conversion
print("\nAfter Conversion:") print(df) print(df.dtypes)
`
**Output:
Convert Column To DateTime using DataFrame.astype() function
Here’s how you can create a DataFrame, inspect its data types, and convert the string-based date column to datetime using **DataFrame.astype().
Python `
Importing pandas
import pandas as pd
Creating the DataFrame
df = pd.DataFrame({ 'Date': ['11/8/2011', '04/23/2008', '10/2/2019'], 'Event': ['Music', 'Poetry', 'Theatre'], 'Cost': [10000, 5000, 15000] })
Checking the initial data type of the 'Date' column
print("Before Conversion:") df.info()
`
**Output:
The Date column initially has the data type **object, indicating it contains strings.
Python `
Converting the 'Date' column to datetime format using astype()
df['Date'] = df['Date'].astype('datetime64[ns]')
Checking the data type after conversion
print("\nAfter Conversion:") df.info()
`
**Output:
Using **astype(‘datetime64[ns]’)** successfully converts the **Date column to the **datetime64[ns]` format.