Series/DataFrame.rank() doesn't handle certain floats properly · Issue #8365 · pandas-dev/pandas (original) (raw)

There appears to be an issue with floats that are close together in series.rank(), pandas version 0.14. For reference this test worked in pandas 0.12.0.

Current Behavior

>>> series = pd.Series([1000.000669 , 1000.000041 , 1000.000059 , 1000.000063 , 1000.000121 , 1000.000104 , 1000.000040 , 1000.000062 , 1000.000095 , 1000.000091 , 1000.000050 , 1000.000074 , 1000.000063 , 1000.000076 , 1000.000083 , 1000.000061 , 1000.000030 , 1000.000069 , 1000.000090 , 1000.000116 , 1000.000058 , 1000.000074 , 1000.000035 , 1000.000084 , 1000.000067 , 1000.000072 , 1000.000105 , 1000.000091 , 1000.000077 , 1000.000040 , 1000.000108 , 1000.000117 , 1000.000114 , 1000.000117 , 1000.000099 , 1000.000039 , 1000.000046 , 1000.000105 , 1000.000057])
>>> series.rank()
0     39.0
1     19.5
2     19.5
3     19.5
4     19.5
5     19.5
6     19.5
7     19.5
8     19.5
9     19.5
10    19.5
11    19.5
12    19.5
13    19.5
14    19.5
15    19.5
16    19.5
17    19.5
18    19.5
19    19.5
20    19.5
21    19.5
22    19.5
23    19.5
24    19.5
25    19.5
26    19.5
27    19.5
28    19.5
29    19.5
30    19.5
31    19.5
32    19.5
33    19.5
34    19.5
35    19.5
36    19.5
37    19.5
38    19.5
dtype: float64

Expected Behavior

>>> from scipy import stats
>>> stats.rankdata(series)
array([ 39. ,   6. ,  11. ,  14.5,  38. ,  30. ,   4.5,  13. ,  28. ,
        26.5,   8. ,  19.5,  14.5,  21. ,  23. ,  12. ,   1. ,  17. ,
        25. ,  35. ,  10. ,  19.5,   2. ,  24. ,  16. ,  18. ,  31.5,
        26.5,  22. ,   4.5,  33. ,  36.5,  34. ,  36.5,  29. ,   3. ,
         7. ,  31.5,   9. ])

System Information

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 13.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.4
Cython: 0.21
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: None
sphinx: None
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.0.0
numexpr: 2.4
matplotlib: None
openpyxl: 2.1.0
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.0
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)