PERF: pd.Index.is_all_dates doesn't require full-scan inference by immerrr · Pull Request #6341 · pandas-dev/pandas (original) (raw)

While working on #6328 I've stumbled upon this piece that deserves optimization.

Current implementation Index.is_all_dates invokes lib.infer_dtype which does a full-scan dtype inference which is not not always necessary. If I read cython sources correctly, new code is equivalent to the old one and the performance increase comes from short-circuiting the check on the first non-datetime element.

This change cuts roughly 50% of runtime of the following snippet:

import numpy as np import pandas as pd

NUM_ELEMENTS = 1e7 s = pd.Series(np.arange(NUM_ELEMENTS), index=['a%s' % i for i in np.arange(NUM_ELEMENTS)])

print("pd.version", pd.version) %timeit s[:int(NUM_ELEMENTS/2)]

On recent master:

$ ipython test_index_is_all_dates.ipy 
pd.version 0.13.1-108-g87b4308
10 loops, best of 3: 55.5 ms per loop

On this branch:

$ ipython test_index_is_all_dates.ipy 
pd.version 0.13.1-109-g3b43e2b
10 loops, best of 3: 28 ms per loop