What’s new in 2.2.1 (February 22, 2024) — pandas 2.2.3 documentation (original) (raw)
These are the changes in pandas 2.2.1. See Release notes for a full changelog including other versions of pandas.
Enhancements#
- Added
pyarrow
pip extra so users can install pandas and pyarrow with pip withpip install pandas[pyarrow]
(GH 54466)
Fixed regressions#
- Fixed memory leak in read_csv() (GH 57039)
- Fixed performance regression in Series.combine_first() (GH 55845)
- Fixed regression causing overflow for near-minimum timestamps (GH 57150)
- Fixed regression in concat() changing long-standing behavior that always sorted the non-concatenation axis when the axis was a DatetimeIndex (GH 57006)
- Fixed regression in merge_ordered() raising
TypeError
forfill_method="ffill"
andhow="left"
(GH 57010) - Fixed regression in pandas.testing.assert_series_equal() defaulting to
check_exact=True
when checking the Index (GH 57067) - Fixed regression in read_json() where an Index would be returned instead of a RangeIndex (GH 57429)
- Fixed regression in wide_to_long() raising an
AttributeError
for string columns (GH 57066) - Fixed regression in DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), SeriesGroupBy.idxmax() ignoring the
skipna
argument (GH 57040) - Fixed regression in DataFrameGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmin(), SeriesGroupBy.idxmax() where values containing the minimum or maximum value for the dtype could produce incorrect results (GH 57040)
- Fixed regression in
CategoricalIndex.difference()
raisingKeyError
when other contains null values other than NaN (GH 57318) - Fixed regression in DataFrame.groupby() raising
ValueError
when grouping by a Series in some cases (GH 57276) - Fixed regression in DataFrame.loc() raising
IndexError
for non-unique, masked dtype indexes where result has more than 10,000 rows (GH 57027) - Fixed regression in DataFrame.loc() which was unnecessarily throwing “incompatible dtype warning” when expanding with partial row indexer and multiple columns (see PDEP6) (GH 56503)
- Fixed regression in DataFrame.map() with
na_action="ignore"
not being respected for NumPy nullable andArrowDtypes
(GH 57316) - Fixed regression in DataFrame.merge() raising
ValueError
for certain types of 3rd-party extension arrays (GH 57316) - Fixed regression in DataFrame.query() with all
NaT
column with object dtype (GH 57068) - Fixed regression in DataFrame.shift() raising
AssertionError
foraxis=1
and empty DataFrame (GH 57301) - Fixed regression in DataFrame.sort_index() not producing a stable sort for a index with duplicates (GH 57151)
- Fixed regression in DataFrame.to_dict() with
orient='list'
and datetime or timedelta types returning integers (GH 54824) - Fixed regression in DataFrame.to_json() converting nullable integers to floats (GH 57224)
- Fixed regression in DataFrame.to_sql() when
method="multi"
is passed and the dialect type is not Oracle (GH 57310) - Fixed regression in DataFrame.transpose() with nullable extension dtypes not having F-contiguous data potentially causing exceptions when used (GH 57315)
- Fixed regression in DataFrame.update() emitting incorrect warnings about downcasting (GH 57124)
- Fixed regression in
DataFrameGroupBy.idxmin()
,DataFrameGroupBy.idxmax()
,SeriesGroupBy.idxmin()
,SeriesGroupBy.idxmax()
ignoring theskipna
argument (GH 57040) - Fixed regression in
DataFrameGroupBy.idxmin()
,DataFrameGroupBy.idxmax()
,SeriesGroupBy.idxmin()
,SeriesGroupBy.idxmax()
where values containing the minimum or maximum value for the dtype could produce incorrect results (GH 57040) - Fixed regression in
ExtensionArray.to_numpy()
raising for non-numeric masked dtypes (GH 56991) - Fixed regression in Index.join() raising
TypeError
when joining an empty index to a non-empty index containing mixed dtype values (GH 57048) - Fixed regression in Series.astype() introducing decimals when converting from integer with missing values to string dtype (GH 57418)
- Fixed regression in Series.pct_change() raising a
ValueError
for an empty Series (GH 57056) - Fixed regression in Series.to_numpy() when dtype is given as float and the data contains NaNs (GH 57121)
- Fixed regression in addition or subtraction of
DateOffset
objects with millisecond components todatetime64
Index, Series, or DataFrame (GH 57529)
Bug fixes#
- Fixed bug in pandas.api.interchange.from_dataframe() which was raising for Nullable integers (GH 55069)
- Fixed bug in pandas.api.interchange.from_dataframe() which was raising for empty inputs (GH 56700)
- Fixed bug in pandas.api.interchange.from_dataframe() which wasn’t converting columns names to strings (GH 55069)
- Fixed bug in
DataFrame.__getitem__()
for empty DataFrame with Copy-on-Write enabled (GH 57130) - Fixed bug in PeriodIndex.asfreq() which was silently converting frequencies which are not supported as period frequencies instead of raising an error (GH 56945)
Other#
Note
The DeprecationWarning
that was raised when pandas was imported without PyArrow being installed has been removed. This decision was made because the warning was too noisy for too many users and a lot of feedback was collected about the decision to make PyArrow a required dependency. Pandas is currently considering the decision whether or not PyArrow should be added as a hard dependency in 3.0. Interested users can follow the discussionhere.
- Added the argument
skipna
toDataFrameGroupBy.first()
,DataFrameGroupBy.last()
,SeriesGroupBy.first()
, andSeriesGroupBy.last()
; achievingskipna=False
used to be available viaDataFrameGroupBy.nth()
, but the behavior was changed in pandas 2.0.0 (GH 57019) - Added the argument
skipna
toResampler.first()
,Resampler.last()
(GH 57019)
Contributors#
A total of 14 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Albert Villanova del Moral
- Luke Manley
- Lumberbot (aka Jack)
- Marco Edward Gorelli
- Matthew Roeschke
- Natalia Mokeeva
- Pandas Development Team
- Patrick Hoefler
- Richard Shadrach
- Robert Schmidtke +
- Samuel Chai +
- Thomas Li
- William Ayd
- dependabot[bot]