PERF: improve performance of NDFrame.describe by DataOmbudsman · Pull Request #21274 · pandas-dev/pandas (original) (raw)

Sure. Thanks for the suggestion. Here are my ASV benchmarks. These also show the improvement.

Setup

class Describe(object):

    goal_time = 0.2

    def setup(self):
        np.random.seed(123)
        self.df = DataFrame({
            'a': np.random.randint(0, 100, int(1e6)),
            'b': np.random.randint(0, 100, int(1e6)),
            'c': np.random.randint(0, 100, int(1e6)),
        })

    def time_series_describe(self):
        self.df['a'].describe()

    def time_dataframe_describe(self):
        self.df.describe()

Results

before after ratio
689±10ms 495±6ms 0.72 frame_methods.Describe.time_dataframe_describe
234±9ms 166±6ms 0.71 frame_methods.Describe.time_series_describe