Series/DataFrame.rank() doesn't handle certain floats properly (original) (raw)
Description
There appears to be an issue with floats that are close together in series.rank(), pandas version 0.14. For reference this test worked in pandas 0.12.0.
Current Behavior
>>> series = pd.Series([1000.000669 , 1000.000041 , 1000.000059 , 1000.000063 , 1000.000121 , 1000.000104 , 1000.000040 , 1000.000062 , 1000.000095 , 1000.000091 , 1000.000050 , 1000.000074 , 1000.000063 , 1000.000076 , 1000.000083 , 1000.000061 , 1000.000030 , 1000.000069 , 1000.000090 , 1000.000116 , 1000.000058 , 1000.000074 , 1000.000035 , 1000.000084 , 1000.000067 , 1000.000072 , 1000.000105 , 1000.000091 , 1000.000077 , 1000.000040 , 1000.000108 , 1000.000117 , 1000.000114 , 1000.000117 , 1000.000099 , 1000.000039 , 1000.000046 , 1000.000105 , 1000.000057])
>>> series.rank()
0 39.0
1 19.5
2 19.5
3 19.5
4 19.5
5 19.5
6 19.5
7 19.5
8 19.5
9 19.5
10 19.5
11 19.5
12 19.5
13 19.5
14 19.5
15 19.5
16 19.5
17 19.5
18 19.5
19 19.5
20 19.5
21 19.5
22 19.5
23 19.5
24 19.5
25 19.5
26 19.5
27 19.5
28 19.5
29 19.5
30 19.5
31 19.5
32 19.5
33 19.5
34 19.5
35 19.5
36 19.5
37 19.5
38 19.5
dtype: float64
Expected Behavior
>>> from scipy import stats
>>> stats.rankdata(series)
array([ 39. , 6. , 11. , 14.5, 38. , 30. , 4.5, 13. , 28. ,
26.5, 8. , 19.5, 14.5, 21. , 23. , 12. , 1. , 17. ,
25. , 35. , 10. , 19.5, 2. , 24. , 16. , 18. , 31.5,
26.5, 22. , 4.5, 33. , 36.5, 34. , 36.5, 29. , 3. ,
7. , 31.5, 9. ])
System Information
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 13.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.14.1
nose: 1.3.4
Cython: 0.21
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: None
sphinx: None
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.0.0
numexpr: 2.4
matplotlib: None
openpyxl: 2.1.0
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.8.0
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)