What’s new in 2.3.0 (Month XX, 2024) — pandas 3.0.0.dev0+2104.ge637b4290d documentation (original) (raw)
- Release notes
- What’s new in 2.3.0 (Month XX, 2024)
These are the changes in pandas 2.3.0. See Release notes for a full changelog including other versions of pandas.
Upcoming changes in pandas 3.0#
Enhancements#
enhancement1#
Other enhancements#
- The semantics for the
copy
keyword in__array__
methods (i.e. called when usingnp.array()
ornp.asarray()
on pandas objects) has been updated to work correctly with NumPy >= 2 (GH 57739) - Series.str.decode() result now has
StringDtype
whenfuture.infer_string
is True (GH 60709) - to_hdf() and to_hdf() now round-trip with
StringDtype
(GH 60663) - Improved
repr
of NumpyExtensionArray to account for NEP51 (GH 61085) - The Series.str.decode() has gained the argument
dtype
to control the dtype of the result (GH 60940) - The cumsum(), cummin(), and cummax() reductions are now implemented for
StringDtype
columns (GH 60633) - The sum() reduction is now implemented for
StringDtype
columns (GH 59853)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
notable_bug_fix1#
API changes#
- When enabling the
future.infer_string
option: Index set operations (like union or intersection) will now ignore the dtype of an emptyRangeIndex
or emptyIndex
with object dtype when determining the dtype of the resulting Index (GH 60797)
Deprecations#
- Deprecated allowing non-
bool
values forna
in str.contains(), str.startswith(), and str.endswith() for dtypes that do not already disallow these (GH 59615) - Deprecated the
"pyarrow_numpy"
storage option for StringDtype (GH 60152)
Performance improvements#
Bug fixes#
Categorical#
Datetimelike#
Timedelta#
Timezones#
Numeric#
- Enabled Series.mode and DataFrame.mode with
dropna=False
to sort the result for all dtypes in the presence of NA values; previously only certain dtypes would sort (GH 60702)
Conversion#
Strings#
- Bug in DataFrameGroupBy.min(), DataFrameGroupBy.max(), Resampler.min(), Resampler.max() on string input of all NA values would return float dtype; now returns string (GH 60810)
- Bug in DataFrame.sum() with
axis=1
, DataFrameGroupBy.sum() or SeriesGroupBy.sum() withskipna=True
, and Resampler.sum() on StringDtype with all NA values resulted in0
and is now the empty string""
(GH 60229) - Bug in
Series.__pos__()
andDataFrame.__pos__()
did not raise for StringDtype withstorage="pyarrow"
(GH 60710) - Bug in Series.rank() for StringDtype with
storage="pyarrow"
incorrectly returning integer results in case ofmethod="average"
and raising an error if it would truncate results (GH 59768) - Bug in Series.replace() with StringDtype when replacing with a non-string value was not upcasting to
object
dtype (GH 60282) - Bug in Series.str.replace() when
n < 0
for StringDtype withstorage="pyarrow"
(GH 59628) - Bug in
ser.str.slice
with negativestep
with ArrowDtype and StringDtype withstorage="pyarrow"
giving incorrect results (GH 59710) - Bug in the
center
method on Series and Index objectstr
accessors with pyarrow-backed dtype not matching the python behavior in corner cases with an odd number of fill characters (GH 54792)
Interval#
Indexing#
- Fixed bug in Index.get_indexer() round-tripping through string dtype when
infer_string
is enabled (GH 55834)
Missing#
MultiIndex#
I/O#
- DataFrame.to_excel() was storing decimals as strings instead of numbers (GH 49598)
Period#
Plotting#
Groupby/resample/rolling#
Reshaping#
Sparse#
ExtensionArray#
Styler#
Other#
- Fixed usage of
inspect
when the optional dependenciespyarrow
orjinja2
are not installed (GH 60196)