Pandas (0.18) Rank: unexpected behavior for method = 'dense' and pct = True · Issue #15630 · pandas-dev/pandas (original) (raw)
I find the behavior of rank function with method = 'dense' and pct = True unexpected as it looks like, in order to calculate percentile ranks, the function is using the total number of observations instead of the number of distinct observations.
Code Sample, a copy-pastable example if possible
import pandas as pd
n_rep = 2
ts = pd.Series([1,2,3,4] * n_rep )
output = ts.rank(method = 'dense', pct = True)
Problem description
ts.rank(method = 'dense', pct = True)
Out[116]:
0 0.125
1 0.250
2 0.375
3 0.500
4 0.125
5 0.250
6 0.375
7 0.500
Expected Output
Something similar to:
pd.Series([1,2,3,4] * 2).rank(method = 'dense', pct = True) * n_rep
Out[118]:
0 0.25
1 0.50
2 0.75
3 1.00
4 0.25
5 0.50
6 0.75
7 1.00
Also, I would expected the result above to be invariant to n_rep.
i.e. I would expect a "mapping" {value -> pct_rank} that would not depend on how many times the value is repeated, while it is not the case here.