infer_dtype() function slower in latest version · Issue #28814 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

@borisaltanov

Description

@borisaltanov

Code Sample

from datetime import datetime

import numpy as np import pandas as pd from pandas.api.types import infer_dtype

print(pd.version)

RUN_COUNT = 5

df = pd.DataFrame(np.ones((100000, 1000)))

avg_times = []

for _ in range(RUN_COUNT):

start = datetime.now()

for col in df.columns:
    infer_dtype(df[col], skipna=True)

avg_times.append(datetime.now() - start)

print('Average time: ', np.mean(avg_times))

Problem description

When I run the above code on my machine, there is major difference in the performance between different versions. I ran the code using new conda environments created with Python 3.7 and the specific version of Pandas. The table below shows the execution times.

Pandas Version Execution time
0.23.4 0:00:00.013999
0.25.1 0:00:15.623905

The example code is only an example. In the most cases I need this function to get more specific type of the column than the type returned by the .dtype parameter (usually when working with mixed data).