PERF: read_csv should check if column is already a datetime column before initiating the conversion · Issue #52546 · pandas-dev/pandas (original) (raw)

@phofl

Pandas version checks

Reproducible Example

dr = pd.Series(pd.date_range("2019-12-31", periods=1_000_000, freq="s").astype(pd.ArrowDtype(pa.timestamp(unit="ns"))), name="a")
dr.to_csv("tmp.csv")
pd.read_csv("tmp.csv", engine="pyarrow", dtype_backend="pyarrow", parse_dates=["a"])

The read call takes 1.6 seconds, without parse dates it's down to 0.01 and pyarrow already enforces timestamp

            int64[pyarrow]
a    timestamp[s][pyarrow]
dtype: object

This was introduced by the dtype backend I guess, so would like to fix soonish

Installed Versions

main

Prior Performance

No response