ZeroDivisionError when groupby rank with method="dense" and pct=True · Issue #23666 · pandas-dev/pandas (original) (raw)
When I tried to use groupby rank function with method="dense", pct=True options, I encountered the ZeroDivisionError.
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({"A": [1, 1, 1, 2, 2, 2], "B": [1, 1, 1, 1, 2, 2], "C": [1, 2, 1, 1, 1, 2]}) df.groupby(["A", "B"])["C"].rank(method="dense", pct=True)
error:
Traceback (most recent call last):
File "c:/Users/<user_name>/Documents/test.py", line 6, in <module>
df.groupby(["A", "B"])["C"].rank(method="dense", pct=True)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 1906, in rank
na_option=na_option, pct=pct, axis=axis)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 1025, in _cython_transform
**kwargs)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2630, in transform
return self._cython_operation('transform', values, how, axis, **kwargs)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2590, in _cython_operation
**kwargs)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2664, in _transform
transform_func(result, values, comp_ids, is_datetimelike, **kwargs)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2479, in wrapper
return f(afunc, *args, **kwargs)
File "C:\Users\<user_name>\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 2431, in <lambda>
kwargs.get('na_option', 'keep')
File "pandas\_libs\groupby_helper.pxi", line 1292, in pandas._libs.groupby.group_rank_int64
ZeroDivisionError: float division
Problem description
I encountered ZeroDivisionError when I tried to use the groupby rank function.
I can't find out exactly what a problem is. But when I drop either method="dense" or pct=True option, the above code works.
If some elements in the above DataFrame are changed, this error disappear. For example, the following code gives the expected output.
df = pd.DataFrame({"A": [1, 1, 1, 2, 2, 2], "B": [1, 1, 1, 1, 2, 2], "C": [1, 2, 1, 0, 1, 2]}) # a little change in column C df.groupby(["A", "B"])["C"].rank(method="dense", pct=True)
output:
0 0.5
1 1.0
2 0.5
3 1.0
4 0.5
5 1.0
Name: C, dtype: float64
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.5
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None