Pandas conversion of Timedelta is very slow · Issue #18092 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

a=pd.Series([pd.Timedelta(days=x) for x in np.random.randint(0, 10, 60000)])

Let's convert Timedelta to days with Pandas

%time a.dt.days

CPU times: user 457 ms, sys: 4.08 ms, total: 461 ms

Wall time: 464 ms

Let's convert Timedelta to days by division

%time (a / np.timedelta64(1, 'D')).astype(np.int64)

CPU times: user 3.19 ms, sys: 1.79 ms, total: 4.98 ms

Wall time: 3.18 ms

Make sure results are the same

((a / np.timedelta64(1, 'D')).astype(np.int64)==a.dt.days).value_counts()

True 60000

dtype: int64

Problem description

For large Series it takes very long time to do a simple conversion, this should be optimised.

Expected Output

.dt.days should be as quick as dividing by np.timedelta64(1, 'D')

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None