Import statements in period.pyx significantly impact performance (original) (raw)

The following imports in pandas/src/period.pyx significantly impact performance when dealing with multiple Period objects. A quick win, guys.

def __init__(self, value=None, freq=None, ordinal=None,
             year=None, month=1, quarter=None, day=1,
             hour=0, minute=0, second=0):
    from pandas.tseries import frequencies
    from pandas.tseries.frequencies import get_freq_code as _gfc

    # freq points to a tuple (base, mult);  base is one of the defined
    # periods such as A, Q, etc. Every five minutes would be, e.g.,
    # ('T', 5) but may be passed in as a string like '5T'

Just profile the code below and observe the number of times _find_and_load gets called:

import pandas as pd

for _ in range(1000): pd.Period('2015-04-26')

bfa8066 is the commit that has introduced the problem.

I will submit a pull request that rectifies the incorrect fix to the circular dependency.