What’s new in 1.5.1 (October 19, 2022) — pandas 2.2.3 documentation (original) (raw)

These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.

Behavior of groupby with categorical groupers (GH 48645)#

In versions of pandas prior to 1.5, groupby with dropna=False would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False anddropna=False to groupby would result in only observed categories. It was found that the patch fixing the dropna=False bug is incompatible with observed=False, and decided that the best resolution is to restore the correct observed=Falsebehavior at the cost of reintroducing the dropna=False bug.

In [1]: df = pd.DataFrame( ...: { ...: "x": pd.Categorical([1, None], categories=[1, 2, 3]), ...: "y": [3, 4], ...: } ...: ) ...:

In [2]: df Out[2]: x y 0 1 3 1 NaN 4

1.5.0 behavior:

In [3]: # Correct behavior, NA values are not dropped df.groupby("x", observed=True, dropna=False).sum() Out[3]: y x 1 3 NaN 4

In [4]: # Incorrect behavior, only observed categories present df.groupby("x", observed=False, dropna=False).sum() Out[4]: y x 1 3 NaN 4

1.5.1 behavior:

Incorrect behavior, NA values are dropped

In [3]: df.groupby("x", observed=True, dropna=False).sum() Out[3]: y x
1 3 NaN 4

Correct behavior, unobserved categories present (NA values still dropped)

In [4]: df.groupby("x", observed=False, dropna=False).sum() Out[4]: y x
1 3 2 0 3 0 NaN 4

Fixed regressions#

Bug fixes#

Other#

Contributors#

A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.