Pandas (0.18) Rank: unexpected behavior for method = 'dense' and pct = True · Issue #15630 · pandas-dev/pandas (original) (raw)

I find the behavior of rank function with method = 'dense' and pct = True unexpected as it looks like, in order to calculate percentile ranks, the function is using the total number of observations instead of the number of distinct observations.

Code Sample, a copy-pastable example if possible

import pandas as pd
n_rep = 2
ts = pd.Series([1,2,3,4] * n_rep )
output = ts.rank(method = 'dense', pct = True)

Problem description

ts.rank(method = 'dense', pct = True)
Out[116]: 
0    0.125
1    0.250
2    0.375
3    0.500
4    0.125
5    0.250
6    0.375
7    0.500

Expected Output

Something similar to:

pd.Series([1,2,3,4] * 2).rank(method = 'dense', pct = True) * n_rep 
Out[118]: 
0    0.25
1    0.50
2    0.75
3    1.00
4    0.25
5    0.50
6    0.75
7    1.00

Also, I would expected the result above to be invariant to n_rep.
i.e. I would expect a "mapping" {value -> pct_rank} that would not depend on how many times the value is repeated, while it is not the case here.