infer_dtype() function slower in latest version · Issue #28814 · pandas-dev/pandas (original) (raw)
Navigation Menu
- GitHub Copilot Write better code with AI
- GitHub Models New Manage and compare prompts
- GitHub Advanced Security Find and fix vulnerabilities
- Actions Automate any workflow
- Codespaces Instant dev environments
- Issues Plan and track work
- Code Review Manage code changes
- Discussions Collaborate outside of code
- Code Search Find more, search less
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Appearance settings
Description
Code Sample
from datetime import datetime
import numpy as np import pandas as pd from pandas.api.types import infer_dtype
print(pd.version)
RUN_COUNT = 5
df = pd.DataFrame(np.ones((100000, 1000)))
avg_times = []
for _ in range(RUN_COUNT):
start = datetime.now()
for col in df.columns:
infer_dtype(df[col], skipna=True)
avg_times.append(datetime.now() - start)
print('Average time: ', np.mean(avg_times))
Problem description
When I run the above code on my machine, there is major difference in the performance between different versions. I ran the code using new conda environments created with Python 3.7 and the specific version of Pandas. The table below shows the execution times.
Pandas Version | Execution time |
---|---|
0.23.4 | 0:00:00.013999 |
0.25.1 | 0:00:15.623905 |
The example code is only an example. In the most cases I need this function to get more specific type of the column than the type returned by the .dtype parameter (usually when working with mixed data).