What’s new in 1.5.1 (October 19, 2022) — pandas 2.2.3 documentation (original) (raw)
These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.
Behavior of groupby
with categorical groupers (GH 48645)#
In versions of pandas prior to 1.5, groupby
with dropna=False
would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False
anddropna=False
to groupby
would result in only observed categories. It was found that the patch fixing the dropna=False
bug is incompatible with observed=False
, and decided that the best resolution is to restore the correct observed=False
behavior at the cost of reintroducing the dropna=False
bug.
In [1]: df = pd.DataFrame( ...: { ...: "x": pd.Categorical([1, None], categories=[1, 2, 3]), ...: "y": [3, 4], ...: } ...: ) ...:
In [2]: df Out[2]: x y 0 1 3 1 NaN 4
1.5.0 behavior:
In [3]: # Correct behavior, NA values are not dropped df.groupby("x", observed=True, dropna=False).sum() Out[3]: y x 1 3 NaN 4
In [4]: # Incorrect behavior, only observed categories present df.groupby("x", observed=False, dropna=False).sum() Out[4]: y x 1 3 NaN 4
1.5.1 behavior:
Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
y
x
1 3
NaN 4
Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
y
x
1 3
2 0
3 0
NaN 4
Fixed regressions#
- Fixed Regression in
Series.__setitem__()
castingNone
toNaN
for object dtype (GH 48665) - Fixed Regression in DataFrame.loc() when setting values as a DataFrame with all
True
indexer (GH 48701) - Regression in read_csv() causing an
EmptyDataError
when using an UTF-8 file handle that was already read from (GH 48646) - Regression in to_datetime() when
utc=True
andarg
contained timezone naive and aware arguments raised aValueError
(GH 48678) - Fixed regression in DataFrame.loc() raising
FutureWarning
when setting an empty DataFrame (GH 48480) - Fixed regression in DataFrame.describe() raising
TypeError
when result containsNA
(GH 48778) - Fixed regression in DataFrame.plot() ignoring invalid
colormap
forkind="scatter"
(GH 48726) - Fixed regression in
MultiIndex.values()
resettingfreq
attribute of underlying Index object (GH 49054) - Fixed performance regression in factorize() when
na_sentinel
is notNone
andsort=False
(GH 48620) - Fixed regression causing an
AttributeError
during warning emitted if the provided table name in DataFrame.to_sql() and the table name actually used in the database do not match (GH 48733) - Fixed regression in to_datetime() when
arg
was a date string with nanosecond andformat
contained%f
would raise aValueError
(GH 48767) - Fixed regression in testing.assert_frame_equal() raising for MultiIndex with Categorical and
check_like=True
(GH 48975) - Fixed regression in DataFrame.fillna() replacing wrong values for
datetime64[ns]
dtype andinplace=True
(GH 48863) - Fixed DataFrameGroupBy.size() not returning a Series when
axis=1
(GH 48738) - Fixed Regression in DataFrameGroupBy.apply() when user defined function is called on an empty dataframe (GH 47985)
- Fixed regression in DataFrame.apply() when passing non-zero
axis
via keyword argument (GH 48656) - Fixed regression in Series.groupby() and DataFrame.groupby() when the grouper is a nullable data type (e.g.
Int64
) or a PyArrow-backed string array, contains null values, anddropna=False
(GH 48794) - Fixed performance regression in Series.isin() with mismatching dtypes (GH 49162)
- Fixed regression in DataFrame.to_parquet() raising when file name was specified as
bytes
(GH 48944) - Fixed regression in ExcelWriter where the
book
attribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (GH 48780) - Fixed regression in DataFrame.corrwith() when computing correlation on tied data with
method="spearman"
(GH 48826)
Bug fixes#
- Bug in
Series.__getitem__()
not falling back to positional for integer keys and boolean Index (GH 48653) - Bug in DataFrame.to_hdf() raising
AssertionError
with boolean index (GH 48667) - Bug in testing.assert_index_equal() for extension arrays with non matching
NA
raisingValueError
(GH 48608) - Bug in DataFrame.pivot_table() raising unexpected
FutureWarning
when setting datetime column as index (GH 48683) - Bug in DataFrame.sort_values() emitting unnecessary
FutureWarning
when called on DataFrame with boolean sparse columns (GH 48784) - Bug in arrays.ArrowExtensionArray with a comparison operator to an invalid object would not raise a
NotImplementedError
(GH 48833)
Other#
- Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH 48692)
Contributors#
A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Amay Patel +
- Deepak Sirohiwal +
- Dennis Chukwunta
- Gaurav Sheni
- Himanshu Wagh +
- Lorenzo Vainigli +
- Marc Garcia
- Marco Edward Gorelli
- Matthew Roeschke
- MeeseeksMachine
- Noa Tamir
- Pandas Development Team
- Patrick Hoefler
- Richard Shadrach
- Shantanu
- Torsten Wörtwein