groupby.rank 'na_option="bottom"' Usage Clarification · Issue #22124 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd ...: import numpy as np ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]}) ...: df["key"] = pd.Series(["foo"]*7)

In [2]: df Out[2]: val key 0 2.0 foo 1 NaN foo 2 2.0 foo 3 8.0 foo 4 2.0 foo 5 NaN foo 6 6.0 foo

In [5]: df.groupby("key").rank(na_option="not bottom") Out[5]: val 0 2.0 1 6.5 2 2.0 3 5.0 4 2.0 5 6.5 6 4.0

Problem description

When an invalid value is passed to groupby.rank for na_option argument. It didn't raise a ValueError as expected. The same behavior will raise a ValueError("na_option must be one of 'keep', 'top', or 'bottom'") in DataFrame.rank or Series.rank
The expected output is derived from #19499

Expected Output

In [1]: import pandas as pd ...: import numpy as np ...: df = pd.DataFrame({'val': [2, np.nan, 2, 8, 2, np.nan, 6]}) ...: df["key"] = pd.Series(["foo"]*7) ...:

In [2]: df.rank(na_option="no bottom")

ValueError Traceback (most recent call last) in () ----> 1 df.rank(na_option="no bottom")

C:\Users\Public\pandas-peter\pandas\core\generic.py in rank(self, axis, method, numeric_only, na_option, ascending, pct) 7523 if na_option not in {'keep', 'top', 'bottom'}: 7524 msg = "na_option must be one of 'keep', 'top', or 'bottom'" -> 7525 raise ValueError(msg) 7526 7527 def ranker(data):

ValueError: na_option must be one of 'keep', 'top', or 'bottom'

Output of pd.show_versions()

INSTALLED VERSIONS

commit: d30c4a0
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.24.0.dev0+377.gd30c4a069
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.28.4
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None