Pandas conversion of Timedelta is very slow · Issue #18092 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
a=pd.Series([pd.Timedelta(days=x) for x in np.random.randint(0, 10, 60000)])
Let's convert Timedelta to days with Pandas
%time a.dt.days
CPU times: user 457 ms, sys: 4.08 ms, total: 461 ms
Wall time: 464 ms
Let's convert Timedelta to days by division
%time (a / np.timedelta64(1, 'D')).astype(np.int64)
CPU times: user 3.19 ms, sys: 1.79 ms, total: 4.98 ms
Wall time: 3.18 ms
Make sure results are the same
((a / np.timedelta64(1, 'D')).astype(np.int64)==a.dt.days).value_counts()
True 60000
dtype: int64
Problem description
For large Series it takes very long time to do a simple conversion, this should be optimised.
Expected Output
.dt.days should be as quick as dividing by np.timedelta64(1, 'D')
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None