BUG: pandas rolling_quantile does not use interpolation · Issue #9413 · pandas-dev/pandas (original) (raw)

i recently bumped an unexpected issue with pandas rolling funcs. rolling_quantile for example:

>> row = 10
>> col = 5
>> idx = pd.date_range(20100101,periods=row,freq='B')
>> a = pd.DataFrame(np.random.rand(row*col).reshape((row,-1)),index=idx)
>> a
                   0           1           2           3           4
2010-01-01  0.341434    0.497274    0.596341    0.259909    0.872207
2010-01-04  0.222653    0.056723    0.064019    0.936307    0.785647
2010-01-05  0.179067    0.647165    0.931266    0.557698    0.713282
2010-01-06  0.049766    0.259756    0.945736    0.380948    0.282667
2010-01-07  0.385036    0.517609    0.575958    0.050758    0.850735
2010-01-08  0.628169    0.510453    0.325973    0.263361    0.444959
2010-01-11  0.099133    0.976571    0.602235    0.181185    0.506316
2010-01-12  0.987344    0.902289    0.080000    0.254695    0.753325
2010-01-13  0.759198    0.014548    0.139858    0.822900    0.251972
2010-01-14  0.404149    0.349788    0.038714    0.280568    0.197865

>> a.quantile([0.25,0.5,0.75],axis=0)
               0           1           2           3           4
0.25    0.189963    0.282264    0.094964    0.255999    0.323240
0.50    0.363235    0.503864    0.450966    0.271964    0.609799
0.75    0.572164    0.614776    0.600761    0.513510    0.777567

>> np.percentile(a,[25,50,75],axis=0)
[array([ 0.18996316,  0.28226404,  0.09496441,  0.25599853,  0.32323997]),
 array([ 0.36323529,  0.50386356,  0.45096554,  0.27196429,  0.60979881]),
 array([ 0.57216415,  0.61477607,  0.6007611 ,  0.51351021,  0.7775667 ])]

>> pd.rolling_quantile(a,row,0.25).tail(1)
                   0           1       2           3           4
2010-01-14  0.179067    0.259756    0.08    0.254695    0.282667

looks like pandas.DataFrame.quantile member func is consistent with the numpy.percentile func. however the pandas.rolling_quantile func returns diff results. reduce the row number to 5, the problem will be gone (all three methods return the same results). any thoughts?

ps: i also tested rolling_std func which will "randomly" generate errors with 10^-7 ~ 10^-8 scales (compared to pandas.DataFrame std member func or numpy/scipy std funcs which could limit the error close to np.spacing(1) level) for long (row-wise) pandas.DataFrames

python 3.4.2
cython 0.21.1
numpy 1.8.2
scipy 0.14.0
pandas 0.15.1
statsmodels 0.6.0