BUG: GroupBy.count arg minlength passed to np.bincount must be None rather than 0 for np<1.13 · Issue #21956 · pandas-dev/pandas (original) (raw)

df = pd.DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C':[1,2]}) df.groupby(['A', 'B']).C.count() Series([], Name: C, dtype: int64) # ok with numpy>=1.13 ValueError: minlength must be positive # with numpy <1.13

Problem description

Currently, SeriesGroupBy.count passes 0 to np.bincount. This is ok for numpy>=1.13, where 0 is an accepted value. However, for numpy<1.13, 0 was not a possible value for minlength, and None should be passed instead.

This is easy to fix, in SeriesGroupBy.count just insert a guard for numpy<1.13 and minlength=0.

Expected Output

Series([], Name: C, dtype: int64), irrelevant of numpy version.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 664ddb8
python: 3.6.3.final.0
python-bits: 32
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0.dev0+319.g2f2876b8c.dirty
pytest: 3.3.1
pip: 10.0.1
setuptools: 38.2.5
Cython: 0.28.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: 0.10.0
IPython: 6.4.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None