BUG: Groupby any/all raises TypeError for pd.NA · Issue #37501 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame({"a": [1, 1, 2, 2, 3, 3, 4], "b": [1, pd.NA, 2, 3, 4, 5, 6]}) grouped = df.groupby("a") print(grouped.all()) print(grouped.any())

Problem description

Both raise a TypeError

Traceback:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 10, in <module>
    print(grouped.all())
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1424, in all
    return self._bool_agg("all", skipna)
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1375, in _bool_agg
    return self._get_cythonized_result(
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 2631, in _get_cythonized_result
    raise TypeError(error_msg)
TypeError: boolean value of NA is ambiguous

Maybe using mask like in print(df.all(axis=1))?

Expected Output

Should work like with np.nan and return

      b
a      
1  True
2  True
3  True
4  True
      b
a      
1  True
2  True
3  True
4  True

Output of pd.show_versions()

master