Speed up DatetimeConverter for plotting · Issue #6636 · pandas-dev/pandas (original) (raw)
I've recently started using pandas (impressed so far!) and found that plotting large data (from around 100k) samples is quite slow. I traced the bottleneck to the _dt_to_float_ordinal helper function called by DatetimeConverter.(https://github.com/pydata/pandas/blob/master/pandas/tseries/converter.py#L144).
More specifically, this function uses matplotlib's date2num, which converts arrays and iterables using a slow list comprehension. Since pandas seem to natively store datetimes as epoch+nanoseconds in an int64 array, it would be much faster to use matplotlib's vectorized epoch2num instead. In a testcase with 1 million points, using epoch2num is about 100 times faster than date2num:
from pandas import date_range, DataFrame from numpy import int64, arange from matplotlib import pyplot, dates import time
n = 1e6
df = DataFrame(arange(n), index = date_range('20130101', periods=n, freq='S'))
start = time.time() pyplot.plot(df.index, df) print('date2num took {0:g}s'.format(time.time() - start)) pyplot.show()
monkey patch
import pandas.tseries.converter def _my_dt_to_float_ordinal(dt): try: base = dates.epoch2num(dt.astype(int64) / 1.0E9) except AttributeError: base = dates.date2num(dt) return base pandas.tseries.converter._dt_to_float_ordinal = _my_dt_to_float_ordinal
start = time.time() pyplot.plot(df.index, df) print('epoch2num took {0:g}s'.format(time.time() - start)) pyplot.show()
Unfortunately, I am not familiar enough with pandas to know whether date2num is used intentionally or to implement a proper patch myself that works in all cases.