Cythonized GroupBy Quantile by WillAyd · Pull Request #20405 · pandas-dev/pandas (original) (raw)
I started this change with the intention of fully Cythonizing the GroupBy describe method, but along the way realized it was worth implementing a Cythonized GroupBy quantile function first. In theory we could concat together count, mean, std, min, median, max, and two quantile calls (one for 25% and the other for 75%) to get describe.
I decided to push this up as is because this alone is a decent amount of work and I think is worth getting reviewed before going further.
- closes #xxxx
- tests added / passed
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff - whatsnew entry
Relevant ASVs provided below:
before after ratio
[4a43815d] [d7aec3fa]236±6ms 421±40μs 0.00 groupby.GroupByMethods.time_dtype_as_field('float', 'quantile', 'transformation')239±4ms 380±20μs 0.00 groupby.GroupByMethods.time_dtype_as_field('float', 'quantile', 'direct')237±1ms 352±9μs 0.00 groupby.GroupByMethods.time_dtype_as_field('int', 'quantile', 'direct')240±2ms 349±10μs 0.00 groupby.GroupByMethods.time_dtype_as_field('int', 'quantile', 'transformation')348±6ms 372±9μs 0.00 groupby.GroupByMethods.time_dtype_as_group('int', 'quantile', 'transformation')270±4ms 276±20μs 0.00 groupby.GroupByMethods.time_dtype_as_field('datetime', 'quantile', 'transformation')357±6ms 347±0.9μs 0.00 groupby.GroupByMethods.time_dtype_as_group('int', 'quantile', 'direct')271±4ms 263±0.6μs 0.00 groupby.GroupByMethods.time_dtype_as_field('datetime', 'quantile', 'direct')515±9ms 351±2μs 0.00 groupby.GroupByMethods.time_dtype_as_group('datetime', 'quantile', 'transformation')522±6ms 350±5μs 0.00 groupby.GroupByMethods.time_dtype_as_group('datetime', 'quantile', 'direct')539±6ms 354±6μs 0.00 groupby.GroupByMethods.time_dtype_as_group('float', 'quantile', 'direct')548±1ms 354±7μs 0.00 groupby.GroupByMethods.time_dtype_as_group('float', 'quantile', 'transformation')