What’s new in 3.0.0 (Month XX, 2024) — pandas 3.0.0.dev0+2104.ge637b4290d documentation (original) (raw)
These are the changes in pandas 3.0.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Enhancement1#
Enhancement2#
Other enhancements#
pandas.api.typing.FrozenList
is available for typing the outputs of MultiIndex.names, MultiIndex.codes and MultiIndex.levels (GH 58237)pandas.api.typing.SASReader
is available for typing the output of read_sas() (GH 55689)- pandas.api.interchange.from_dataframe() now uses the PyCapsule Interface if available, only falling back to the Dataframe Interchange Protocol if that fails (GH 60739)
- Added Styler.to_typst() to write Styler objects to file, buffer or string in Typst format (GH 57617)
- Added missing pandas.Series.info() to API reference (GH 60926)
pandas.api.typing.NoDefault
is available for typingno_default
- DataFrame.to_excel() now raises an
UserWarning
when the character count in a cell exceeds Excel’s limitation of 32767 characters (GH 56954) - pandas.merge() now validates the
how
parameter input (merge type) (GH 59435) - pandas.merge(), DataFrame.merge() and DataFrame.join() now support anti joins (
left_anti
andright_anti
) in thehow
parameter (GH 42916) - read_spss() now supports kwargs to be passed to pyreadstat (GH 56356)
- read_stata() now returns
datetime64
resolutions better matching those natively stored in the stata format (GH 55642) - DataFrame.agg() called with
axis=1
and afunc
which relabels the result index now raises aNotImplementedError
(GH 58807). - Index.get_loc() now accepts also subclasses of
tuple
as keys (GH 57922) Styler.set_tooltips()
provides alternative method to storing tooltips by using title attribute of td elements. (GH 56981)- Added missing parameter
weights
in DataFrame.plot.kde() for the estimation of the PDF (GH 59337) - Allow dictionaries to be passed to pandas.Series.str.replace() via
pat
parameter (GH 51748) - Support passing a Series input to json_normalize() that retains the Series Index (GH 51452)
- Support reading value labels from Stata 108-format (Stata 6) and earlier files (GH 58154)
- Users can globally disable any
PerformanceWarning
by setting the optionmode.performance_warnings
toFalse
(GH 56920) Styler.format_index_names()
can now be used to format the index and column names (GH 48936 and GH 47489)- errors.DtypeWarning improved to include column names when mixed data types are detected (GH 58174)
Rolling
andExpanding
now supportpipe
method (GH 57076)- Series now supports the Arrow PyCapsule Interface for export (GH 59518)
- DataFrame.to_excel() argument
merge_cells
now accepts a value of"columns"
to only merge MultiIndex column header header cells (GH 35384) - DataFrame.corrwith() now accepts
min_periods
as optional arguments, as in DataFrame.corr() and Series.corr() (GH 9490) - DataFrame.cummin(), DataFrame.cummax(), DataFrame.cumprod() and DataFrame.cumsum() methods now have a
numeric_only
parameter (GH 53072) - DataFrame.ewm() now allows
adjust=False
whentimes
is provided (GH 54328) - DataFrame.fillna() and Series.fillna() can now accept
value=None
; for non-object dtype the corresponding NA value will be used (GH 57723) - DataFrame.pivot_table() and pivot_table() now allow the passing of keyword arguments to
aggfunc
through**kwargs
(GH 57884) - DataFrame.to_json() now encodes
Decimal
as strings instead of floats (GH 60698) - Series.cummin() and Series.cummax() now supports CategoricalDtype (GH 52335)
- Series.plot() now correctly handle the
ylabel
parameter for pie charts, allowing for explicit control over the y-axis label (GH 58239) - DataFrame.plot.scatter() argument
c
now accepts a column of strings, where rows with the same string are colored identically (GH 16827 and GH 16485) - Series.nlargest() uses a ‘stable’ sort internally and will preserve original ordering.
- ArrowDtype now supports
pyarrow.JsonType
(GH 60958) DataFrameGroupBy
andSeriesGroupBy
methodssum
,mean
,median
,prod
,min
,max
,std
,var
andsem
now acceptskipna
parameter (GH 15675)Rolling
andExpanding
now supportnunique
(GH 26958)Rolling
andExpanding
now support aggregationsfirst
andlast
(GH 33155)- read_parquet() accepts
to_pandas_kwargs
which are forwarded to pyarrow.Table.to_pandas() which enables passing additional keywords to customize the conversion to pandas, such asmaps_as_pydicts
to read the Parquet map data type as python dictionaries (GH 56842) - DataFrameGroupBy.transform(), SeriesGroupBy.transform(), DataFrameGroupBy.agg(), SeriesGroupBy.agg(), SeriesGroupBy.apply(), DataFrameGroupBy.apply() now support
kurt
(GH 40139) - DataFrame.apply() supports using third-party execution engines like the Bodo.ai JIT compiler (GH 60668)
- DataFrame.iloc() and Series.iloc() now support boolean masks in
__getitem__
for more consistent indexing behavior (GH 60994) DataFrameGroupBy.transform()
,SeriesGroupBy.transform()
,DataFrameGroupBy.agg()
,SeriesGroupBy.agg()
,RollingGroupby.apply()
,ExpandingGroupby.apply()
,Rolling.apply()
,Expanding.apply()
, DataFrame.apply() withengine="numba"
now supports positional arguments passed as kwargs (GH 58995)Rolling.agg()
,Expanding.agg()
andExponentialMovingWindow.agg()
now accept NamedAgg aggregations through**kwargs
(GH 28333)- Series.map() can now accept kwargs to pass on to func (GH 59814)
- Series.str.get_dummies() now accepts a
dtype
parameter to specify the dtype of the resulting DataFrame (GH 47872) - pandas.concat() will raise a
ValueError
whenignore_index=True
andkeys
is notNone
(GH 59274) - frozenset elements in pandas objects are now natively printed (GH 60690)
- Add
"delete_rows"
option toif_exists
argument in DataFrame.to_sql() deleting all records of the table before inserting data (GH 37210). - Added half-year offset classes
HalfYearBegin
,HalfYearEnd
,BHalfYearBegin
andBHalfYearEnd
(GH 60928) - Added support to read from Apache Iceberg tables with the new read_iceberg() function (GH 61383)
- Errors occurring during SQL I/O will now throw a generic DatabaseError instead of the raw Exception type from the underlying driver manager library (GH 60748)
- Implemented
Series.str.isascii()
andSeries.str.isascii()
(GH 59091) - Improved deprecation message for offset aliases (GH 60820)
- Multiplying two
DateOffset
objects will now raise aTypeError
instead of aRecursionError
(GH 59442) - Restore support for reading Stata 104-format and enable reading 103-format dta files (GH 58554)
- Support passing a
Iterable[Hashable]
input to DataFrame.drop_duplicates() (GH 59237) - Support reading Stata 102-format (Stata 1) dta files (GH 58978)
- Support reading Stata 110-format (Stata 7) dta files (GH 47176)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Improved behavior in groupby for observed=False
#
A number of bugs have been fixed due to improved handling of unobserved groups (GH 55738). All remarks in this section equally impact SeriesGroupBy
.
In previous versions of pandas, a single grouping with DataFrameGroupBy.apply() or DataFrameGroupBy.agg() would pass the unobserved groups to the provided function, resulting in 0
below.
In [1]: df = pd.DataFrame( ...: { ...: "key1": pd.Categorical(list("aabb"), categories=list("abc")), ...: "key2": [1, 1, 1, 2], ...: "values": [1, 2, 3, 4], ...: } ...: ) ...:
In [2]: df Out[2]: key1 key2 values 0 a 1 1 1 a 1 2 2 b 1 3 3 b 2 4
In [3]: gb = df.groupby("key1", observed=False)
In [4]: gb[["values"]].apply(lambda x: x.sum())
Out[4]:
values
key1
a 3
b 7
c 0
However this was not the case when using multiple groupings, resulting in NaN
below.
In [1]: gb = df.groupby(["key1", "key2"], observed=False) In [2]: gb[["values"]].apply(lambda x: x.sum()) Out[2]: values key1 key2 a 1 3.0 2 NaN b 1 3.0 2 4.0 c 1 NaN 2 NaN
Now using multiple groupings will also pass the unobserved groups to the provided function.
In [5]: gb = df.groupby(["key1", "key2"], observed=False)
In [6]: gb[["values"]].apply(lambda x: x.sum())
Out[6]:
values
key1 key2
a 1 3
2 0
b 1 3
2 4
c 1 0
2 0
Similarly:
- In previous versions of pandas the method DataFrameGroupBy.sum() would result in
0
for unobserved groups, but DataFrameGroupBy.prod(), DataFrameGroupBy.all(), and DataFrameGroupBy.any() would all result in NA values. Now these methods result in1
,True
, andFalse
respectively. - DataFrameGroupBy.groups() did not include unobserved groups and now does.
These improvements also fixed certain bugs in groupby:
- DataFrameGroupBy.agg() would fail when there are multiple groupings, unobserved groups, and
as_index=False
(GH 36698) - DataFrameGroupBy.groups() with
sort=False
would sort groups; they now occur in the order they are observed (GH 56966) - DataFrameGroupBy.nunique() would fail when there are multiple groupings, unobserved groups, and
as_index=False
(GH 52848) - DataFrameGroupBy.sum() would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (GH 43891)
- DataFrameGroupBy.value_counts() would produce incorrect results when used with some categorical and some non-categorical groupings and
observed=False
(GH 56016)
notable_bug_fix2#
Backwards incompatible API changes#
Datetime resolution inference#
Converting a sequence of strings, datetime
objects, or np.datetime64
objects to a datetime64
dtype now performs inference on the appropriate resolution (AKA unit) for the output dtype. This affects Series, DataFrame, Index, DatetimeIndex, and to_datetime().
Previously, these would always give nanosecond resolution:
In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime() In [2]: pd.to_datetime([dt]).dtype Out[2]: dtype('<M8[ns]') In [3]: pd.Index([dt]).dtype Out[3]: dtype('<M8[ns]') In [4]: pd.DatetimeIndex([dt]).dtype Out[4]: dtype('<M8[ns]') In [5]: pd.Series([dt]).dtype Out[5]: dtype('<M8[ns]')
This now infers the unit microsecond unit “us” from the pydatetime object, matching the scalar Timestamp behavior.
In [7]: In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [8]: In [2]: pd.to_datetime([dt]).dtype Out[8]: dtype('<M8[us]')
In [9]: In [3]: pd.Index([dt]).dtype Out[9]: dtype('<M8[us]')
In [10]: In [4]: pd.DatetimeIndex([dt]).dtype Out[10]: dtype('<M8[us]')
In [11]: In [5]: pd.Series([dt]).dtype Out[11]: dtype('<M8[us]')
Similar when passed a sequence of np.datetime64
objects, the resolution of the passed objects will be retained (or for lower-than-second resolution, second resolution will be used).
When passing strings, the resolution will depend on the precision of the string, again matching the Timestamp behavior. Previously:
In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype Out[2]: dtype('<M8[ns]') In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype Out[3]: dtype('<M8[ns]') In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype Out[4]: dtype('<M8[ns]') In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype Out[5]: dtype('<M8[ns]')
The inferred resolution now matches that of the input strings:
In [12]: In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype Out[12]: dtype('<M8[s]')
In [13]: In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype Out[13]: dtype('<M8[ms]')
In [14]: In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype Out[14]: dtype('<M8[us]')
In [15]: In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype Out[15]: dtype('<M8[ns]')
In cases with mixed-resolution inputs, the highest resolution is used:
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype Out[2]: dtype('<M8[ns]')
Changed behavior in DataFrame.value_counts() and DataFrameGroupBy.value_counts()
when sort=False
#
In previous versions of pandas, DataFrame.value_counts() with sort=False
would sort the result by row labels (as was documented). This was nonintuitive and inconsistent with Series.value_counts() which would maintain the order of the input. Now DataFrame.value_counts() will maintain the order of the input.
In [16]: df = pd.DataFrame( ....: { ....: "a": [2, 2, 2, 2, 1, 1, 1, 1], ....: "b": [2, 1, 3, 1, 2, 3, 1, 1], ....: } ....: ) ....:
In [17]: df Out[17]: a b 0 2 2 1 2 1 2 2 3 3 2 1 4 1 2 5 1 3 6 1 1 7 1 1
Old behavior
In [3]: df.value_counts(sort=False) Out[3]: a b 1 1 2 2 1 3 1 2 1 2 2 1 3 1 Name: count, dtype: int64
New behavior
In [18]: df.value_counts(sort=False) Out[18]: a b 2 2 1 1 2 3 1 1 2 1 3 1 1 2 Name: count, dtype: int64
This change also applies to DataFrameGroupBy.value_counts(). Here, there are two options for sorting: one sort
passed to DataFrame.groupby() and one passed directly to DataFrameGroupBy.value_counts(). The former will determine whether to sort the groups, the latter whether to sort the counts. All non-grouping columns will maintain the order of the input within groups.
Old behavior
In [5]: df.groupby("a", sort=True).value_counts(sort=False) Out[5]: a b 1 1 2 2 1 3 1 2 1 2 2 1 3 1 dtype: int64
New behavior
In [19]: df.groupby("a", sort=True).value_counts(sort=False) Out[19]: a b 1 2 1 3 1 1 2 2 2 1 3 1 1 2 Name: count, dtype: int64
Increased minimum version for Python#
pandas 3.0.0 supports Python 3.10 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
See Dependencies and Optional dependencies for more.
pytz
now an optional dependency#
pandas now uses zoneinfo from the standard library as the default timezone implementation when passing a timezone string to various methods. (GH 34916)
Old behavior:
In [1]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific") In [2]: ts.tz <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>
New behavior:
In [20]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [21]: ts.tz Out[21]: zoneinfo.ZoneInfo(key='US/Pacific')
pytz
timezone objects are still supported when passed directly, but they will no longer be returned by default from string inputs. Moreover, pytz
is no longer a required dependency of pandas, but can be installed with the pip extra pip install pandas[timezone]
.
Additionally, pandas no longer throws pytz
exceptions for timezone operations leading to ambiguous or nonexistent times. These cases will now raise a ValueError
.
Other API changes#
- 3rd party
py.path
objects are no longer explicitly supported in IO methods. Use pathlib.Path objects instead (GH 57091) - read_table()’s
parse_dates
argument defaults toNone
to improve consistency with read_csv() (GH 57476) - All classes inheriting from builtin
tuple
(including types created with collections.namedtuple()) are now hashed and compared as builtintuple
during indexing operations (GH 57922) - Made
dtype
a required argument inExtensionArray._from_sequence_of_strings()
(GH 56519) - Passing a Series input to json_normalize() will now retain the Series Index, previously output had a new RangeIndex (GH 51452)
- Removed
Index.sort()
which always raised aTypeError
. This attribute is not defined and will raise anAttributeError
(GH 59283) - Unused
dtype
argument has been removed from the MultiIndex constructor (GH 60962) - Updated DataFrame.to_excel() so that the output spreadsheet has no styling. Custom styling can still be done using
Styler.to_excel()
(GH 54154) - pickle and HDF (
.h5
) files created with Python 2 are no longer explicitly supported (GH 57387) - pickled objects from pandas version less than
1.0.0
are no longer supported (GH 57155) - when comparing the indexes in testing.assert_series_equal(), check_exact defaults to True if an Index is of integer dtypes. (GH 57386)
- Index set operations (like union or intersection) will now ignore the dtype of an empty
RangeIndex
or emptyIndex
with object dtype when determining the dtype of the resulting Index (GH 60797)
Deprecations#
Copy keyword#
The copy
keyword argument in the following methods is deprecated and will be removed in a future version:
- DataFrame.truncate() / Series.truncate()
- DataFrame.tz_convert() / Series.tz_convert()
- DataFrame.tz_localize() / Series.tz_localize()
- DataFrame.infer_objects() / Series.infer_objects()
- DataFrame.align() / Series.align()
- DataFrame.astype() / Series.astype()
- DataFrame.reindex() / Series.reindex()
- DataFrame.reindex_like() / Series.reindex_like()
- DataFrame.set_axis() / Series.set_axis()
- DataFrame.to_period() / Series.to_period()
- DataFrame.to_timestamp() / Series.to_timestamp()
- DataFrame.rename() / Series.rename()
- DataFrame.transpose()
- DataFrame.swaplevel()
- DataFrame.merge() /
pd.merge()
Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until necessary. Use .copy
to trigger an eager copy. The copy keyword has no effect starting with 3.0, so it can be safely removed from your code.
Other Deprecations#
- Deprecated
core.internals.api.make_block()
, use public APIs instead (GH 56815) - Deprecated
DataFrameGroupby.corrwith()
(GH 57158) - Deprecated Timestamp.utcfromtimestamp(), use
Timestamp.fromtimestamp(ts, "UTC")
instead (GH 56680) - Deprecated Timestamp.utcnow(), use
Timestamp.now("UTC")
instead (GH 56680) - Deprecated allowing non-keyword arguments in DataFrame.all(), DataFrame.min(), DataFrame.max(), DataFrame.sum(), DataFrame.prod(), DataFrame.mean(), DataFrame.median(), DataFrame.sem(), DataFrame.var(), DataFrame.std(), DataFrame.skew(), DataFrame.kurt(), Series.all(), Series.min(), Series.max(), Series.sum(), Series.prod(), Series.mean(), Series.median(), Series.sem(), Series.var(), Series.std(), Series.skew(), and Series.kurt(). (GH 57087)
- Deprecated allowing non-keyword arguments in Series.to_markdown() except
buf
. (GH 57280) - Deprecated allowing non-keyword arguments in Series.to_string() except
buf
. (GH 57280) - Deprecated behavior of DataFrameGroupBy.groups() and SeriesGroupBy.groups(), in a future version
groups
by one element list will return tuple instead of scalar. (GH 58858) - Deprecated behavior of Series.dt.to_pytimedelta(), in a future version this will return a Series containing python
datetime.timedelta
objects instead of anndarray
of timedelta; this matches the behavior of other Series.dt() properties. (GH 57463) - Deprecated lowercase strings
d
,b
andc
denoting frequencies inDay
,BusinessDay
andCustomBusinessDay
in favour ofD
,B
andC
(GH 58998) - Deprecated lowercase strings
w
,w-mon
,w-tue
, etc. denoting frequencies inWeek
in favour ofW
,W-MON
,W-TUE
, etc. (GH 58998) - Deprecated parameter
method
in DataFrame.reindex_like() / Series.reindex_like() (GH 58667) - Deprecated strings
w
,d
,MIN
,MS
,US
andNS
denoting units in Timedelta in favour ofW
,D
,min
,ms
,us
andns
(GH 59051) - Deprecated the
arg
parameter ofSeries.map
; pass the addedfunc
argument instead. (GH 61260) - Deprecated using
epoch
date format in DataFrame.to_json() and Series.to_json(), useiso
instead. (GH 57063)
Removal of prior version deprecations/changes#
Enforced deprecation of aliases M
, Q
, Y
, etc. in favour of ME
, QE
, YE
, etc. for offsets#
Renamed the following offset aliases (GH 57986):
Other Removals#
DataFrameGroupBy.idxmin
,DataFrameGroupBy.idxmax
,SeriesGroupBy.idxmin
, andSeriesGroupBy.idxmax
will now raise aValueError
when used withskipna=False
and an NA value is encountered (GH 10694)- concat() no longer ignores empty objects when determining output dtypes (GH 39122)
- concat() with all-NA entries no longer ignores the dtype of those entries when determining the result dtype (GH 40893)
- read_excel(), read_json(), read_html(), and read_xml() no longer accept raw string or byte representation of the data. That type of data must be wrapped in a
StringIO
orBytesIO
(GH 53767) - to_datetime() with a
unit
specified no longer parses strings into floats, instead parses them the same way as withoutunit
(GH 50735) - DataFrame.groupby() with
as_index=False
and aggregation methods will no longer exclude from the result the groupings that do not arise from the input (GH 49519) ExtensionArray._reduce()
now requires akeepdims: bool = False
parameter in the signature (GH 52788)- Series.dt.to_pydatetime() now returns a Series of datetime.datetime objects (GH 52459)
SeriesGroupBy.agg()
no longer pins the name of the group to the input passed to the providedfunc
(GH 51703)- All arguments except
name
in Index.rename() are now keyword only (GH 56493) - All arguments except the first
path
-like argument in IO writers are now keyword only (GH 54229) - Changed behavior of
Series.__getitem__()
andSeries.__setitem__()
to always treat integer keys as labels, never as positional, consistent with DataFrame behavior (GH 50617) - Changed behavior of
Series.__getitem__()
,Series.__setitem__()
,DataFrame.__getitem__()
,DataFrame.__setitem__()
with an integer slice on objects with a floating-dtype index. This is now treated as positional indexing (GH 49612) - Disallow a callable argument to Series.iloc() to return a
tuple
(GH 53769) - Disallow allowing logical operations (
||
,&
,^
) between pandas objects and dtype-less sequences (e.g.list
,tuple
); wrap the objects in Series, Index, ornp.array
first instead (GH 52264) - Disallow automatic casting to object in Series logical operations (
&
,^
,||
) between series with mismatched indexes and dtypes other thanobject
orbool
(GH 52538) - Disallow calling Series.replace() or DataFrame.replace() without a
value
and with non-dict-liketo_replace
(GH 33302) - Disallow constructing a arrays.SparseArray with scalar data (GH 53039)
- Disallow indexing an Index with a boolean indexer of length zero, it now raises
ValueError
(GH 55820) - Disallow non-standard (
np.ndarray
, Index,ExtensionArray
, or Series) toisin()
, unique(), factorize() (GH 52986) - Disallow passing a pandas type to Index.view() (GH 55709)
- Disallow units other than “s”, “ms”, “us”, “ns” for datetime64 and timedelta64 dtypes in array() (GH 53817)
- Removed “freq” keyword from
PeriodArray
constructor, use “dtype” instead (GH 52462) - Removed ‘fastpath’ keyword in Categorical constructor (GH 20110)
- Removed ‘kind’ keyword in Series.resample() and DataFrame.resample() (GH 58125)
- Removed
Block
,DatetimeTZBlock
,ExtensionBlock
,create_block_manager_from_blocks
frompandas.core.internals
andpandas.core.internals.api
(GH 55139) - Removed alias
arrays.PandasArray
for arrays.NumpyExtensionArray (GH 53694) - Removed deprecated “method” and “limit” keywords from Series.replace() and DataFrame.replace() (GH 53492)
- Removed extension test classes
BaseNoReduceTests
,BaseNumericReduceTests
,BaseBooleanReduceTests
(GH 54663) - Removed the “closed” and “normalize” keywords in
DatetimeIndex.__new__()
(GH 52628) - Removed the deprecated
delim_whitespace
keyword in read_csv() and read_table(), usesep=r"\s+"
instead (GH 55569) - Require
SparseDtype.fill_value()
to be a valid value for theSparseDtype.subtype()
(GH 53043) - Stopped automatically casting non-datetimelike values (mainly strings) in Series.isin() and Index.isin() with
datetime64
,timedelta64
, and PeriodDtype dtypes (GH 53111) - Stopped performing dtype inference in Index, Series and DataFrame constructors when given a pandas object (Series, Index,
ExtensionArray
), call.infer_objects
on the input to keep the current behavior (GH 56012) - Stopped performing dtype inference when setting a Index into a DataFrame (GH 56102)
- Stopped performing dtype inference with in Index.insert() with object-dtype index; this often affects the index/columns that result when setting new entries into an empty Series or DataFrame (GH 51363)
- Removed the “closed” and “unit” keywords in
TimedeltaIndex.__new__()
(GH 52628, GH 55499) - All arguments in Index.sort_values() are now keyword only (GH 56493)
- All arguments in Series.to_dict() are now keyword only (GH 56493)
- Changed the default value of
na_action
in Categorical.map() toNone
(GH 51645) - Changed the default value of
observed
in DataFrame.groupby() and Series.groupby() toTrue
(GH 51811) - Enforce deprecation in testing.assert_series_equal() and testing.assert_frame_equal() with object dtype and mismatched null-like values, which are now considered not-equal (GH 18463)
- Enforce banning of upcasting in in-place setitem-like operations (GH 59007) (see PDEP6)
- Enforced deprecation
all
andany
reductions withdatetime64
, DatetimeTZDtype, and PeriodDtype dtypes (GH 58029) - Enforced deprecation disallowing
float
“periods” in date_range(), period_range(), timedelta_range(), interval_range(), (GH 56036) - Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes
utc=True
to to_datetime() (GH 57275) - Enforced deprecation in Series.value_counts() and Index.value_counts() with object dtype performing dtype inference on the
.index
of the result (GH 56161) - Enforced deprecation of DataFrameGroupBy.get_group() and SeriesGroupBy.get_group() allowing the
name
argument to be a non-tuple when grouping by a list of length 1 (GH 54155) - Enforced deprecation of Series.interpolate() and DataFrame.interpolate() for object-dtype (GH 57820)
- Enforced deprecation of
offsets.Tick.delta()
, usepd.Timedelta(obj)
instead (GH 55498) - Enforced deprecation of
axis=None
acting the same asaxis=0
in the DataFrame reductionssum
,prod
,std
,var
, andsem
, passingaxis=None
will now reduce over both axes; this is particularly the case when doing e.g.numpy.sum(df)
(GH 21597) - Enforced deprecation of
core.internals
memberDatetimeTZBlock
(GH 58467) - Enforced deprecation of
date_parser
in read_csv(), read_table(), read_fwf(), and read_excel() in favour ofdate_format
(GH 50601) - Enforced deprecation of
keep_date_col
keyword in read_csv() (GH 55569) - Enforced deprecation of
quantile
keyword in Rolling.quantile() and Expanding.quantile(), renamed toq
instead. (GH 52550) - Enforced deprecation of argument
infer_datetime_format
in read_csv(), as a strict version of it is now the default (GH 48621) - Enforced deprecation of combining parsed datetime columns in read_csv() in
parse_dates
(GH 55569) - Enforced deprecation of non-standard (
np.ndarray
,ExtensionArray
, Index, or Series) argument toapi.extensions.take()
(GH 52981) - Enforced deprecation of parsing system timezone strings to
tzlocal
, which depended on system timezone, pass the ‘tz’ keyword instead (GH 50791) - Enforced deprecation of passing a dictionary to
SeriesGroupBy.agg()
(GH 52268) - Enforced deprecation of string
AS
denoting frequency inYearBegin
and stringsAS-DEC
,AS-JAN
, etc. denoting annual frequencies with various fiscal year starts (GH 57793) - Enforced deprecation of string
A
denoting frequency inYearEnd
and stringsA-DEC
,A-JAN
, etc. denoting annual frequencies with various fiscal year ends (GH 57699) - Enforced deprecation of string
BAS
denoting frequency inBYearBegin
and stringsBAS-DEC
,BAS-JAN
, etc. denoting annual frequencies with various fiscal year starts (GH 57793) - Enforced deprecation of string
BA
denoting frequency inBYearEnd
and stringsBA-DEC
,BA-JAN
, etc. denoting annual frequencies with various fiscal year ends (GH 57793) - Enforced deprecation of strings
H
,BH
, andCBH
denoting frequencies inHour
,BusinessHour
,CustomBusinessHour
(GH 59143) - Enforced deprecation of strings
H
,BH
, andCBH
denoting units in Timedelta (GH 59143) - Enforced deprecation of strings
T
,L
,U
, andN
denoting frequencies inMinute
,Milli
,Micro
,Nano
(GH 57627) - Enforced deprecation of strings
T
,L
,U
, andN
denoting units in Timedelta (GH 57627) - Enforced deprecation of the behavior of concat() when
len(keys) != len(objs)
would truncate to the shorter of the two. Now this raises aValueError
(GH 43485) - Enforced deprecation of the behavior of DataFrame.replace() and Series.replace() with CategoricalDtype that would introduce new categories. (GH 58270)
- Enforced deprecation of the behavior of Series.argsort() in the presence of NA values (GH 58232)
- Enforced deprecation of values “pad”, “ffill”, “bfill”, and “backfill” for Series.interpolate() and DataFrame.interpolate() (GH 57869)
- Enforced deprecation removing
Categorical.to_list()
, useobj.tolist()
instead (GH 51254) - Enforced silent-downcasting deprecation for all relevant methods (GH 54710)
- In DataFrame.stack(), the default value of
future_stack
is nowTrue
; specifyingFalse
will raise aFutureWarning
(GH 55448) - Iterating over a
DataFrameGroupBy
orSeriesGroupBy
will return tuples of length 1 for the groups when grouping bylevel
a list of length 1 (GH 50064) - Methods
apply
,agg
, andtransform
will no longer replace NumPy functions (e.g.np.sum
) and built-in functions (e.g.min
) with the equivalent pandas implementation; use string aliases (e.g."sum"
and"min"
) if you desire to use the pandas implementation (GH 53974) - Passing both
freq
andfill_value
in DataFrame.shift() and Series.shift() and DataFrameGroupBy.shift() now raises aValueError
(GH 54818) - Removed DataFrameGroupBy.quantile() and SeriesGroupBy.quantile() supporting bool dtype (GH 53975)
- Removed
DateOffset.is_anchored()
andoffsets.Tick.is_anchored()
(GH 56594) - Removed
DataFrame.applymap
,Styler.applymap
andStyler.applymap_index
(GH 52364) - Removed
DataFrame.bool
andSeries.bool
(GH 51756) - Removed
DataFrame.first
andDataFrame.last
(GH 53710) - Removed
DataFrame.swapaxes
andSeries.swapaxes
(GH 51946) - Removed
DataFrameGroupBy.grouper
andSeriesGroupBy.grouper
(GH 56521) - Removed
DataFrameGroupby.fillna
andSeriesGroupBy.fillna`
(GH 55719) - Removed
Index.format
, use Index.astype() withstr
or Index.map() with aformatter
function instead (GH 55439) - Removed
Resample.fillna
(GH 55719) - Removed
Series.__int__
andSeries.__float__
. Callint(Series.iloc[0])
orfloat(Series.iloc[0])
instead. (GH 51131) - Removed
Series.ravel
(GH 56053) - Removed
Series.view
(GH 56054) - Removed
StataReader.close
(GH 49228) - Removed
_data
from DataFrame, Series, arrays.ArrowExtensionArray (GH 52003) - Removed
axis
argument from DataFrame.groupby(), Series.groupby(), DataFrame.rolling(), Series.rolling(), DataFrame.resample(), and Series.resample() (GH 51203) - Removed
axis
argument from all groupby operations (GH 50405) - Removed
convert_dtype
from Series.apply() (GH 52257) - Removed
method
,limit
fill_axis
andbroadcast_axis
keywords from DataFrame.align() (GH 51968) - Removed
pandas.api.types.is_interval
andpandas.api.types.is_period
, useisinstance(obj, pd.Interval)
andisinstance(obj, pd.Period)
instead (GH 55264) - Removed
pandas.io.sql.execute
(GH 50185) - Removed
pandas.value_counts
, use Series.value_counts() instead (GH 53493) - Removed
read_gbq
andDataFrame.to_gbq
. Usepandas_gbq.read_gbq
andpandas_gbq.to_gbq
instead https://pandas-gbq.readthedocs.io/en/latest/api.html (GH 55525) - Removed
use_nullable_dtypes
from read_parquet() (GH 51853) - Removed
year
,month
,quarter
,day
,hour
,minute
, andsecond
keywords in the PeriodIndex constructor, use PeriodIndex.from_fields() instead (GH 55960) - Removed argument
limit
from DataFrame.pct_change(), Series.pct_change(), DataFrameGroupBy.pct_change(), and SeriesGroupBy.pct_change(); the argumentmethod
must be set toNone
and will be removed in a future version of pandas (GH 53520) - Removed deprecated argument
obj
in DataFrameGroupBy.get_group() and SeriesGroupBy.get_group() (GH 53545) - Removed deprecated behavior of Series.agg() using Series.apply() (GH 53325)
- Removed deprecated keyword
method
on Series.fillna(), DataFrame.fillna() (GH 57760) - Removed option
mode.use_inf_as_na
, convert inf entries toNaN
before instead (GH 51684) - Removed support for DataFrame in
DataFrame.from_records`(:issue:`51697()
) - Removed support for
errors="ignore"
in to_datetime(), to_timedelta() and to_numeric() (GH 55734) - Removed support for
slice
in DataFrame.take() (GH 51539) - Removed the
ArrayManager
(GH 55043) - Removed the
fastpath
argument from the Series constructor (GH 55466) - Removed the
is_boolean
,is_integer
,is_floating
,holds_integer
,is_numeric
,is_categorical
,is_object
, andis_interval
attributes of Index (GH 50042) - Removed the
ordinal
keyword in PeriodIndex, use PeriodIndex.from_ordinals() instead (GH 55960) - Removed unused arguments
*args
and**kwargs
inResampler
methods (GH 50977) - Unrecognized timezones when parsing strings to datetimes now raises a
ValueError
(GH 51477) - Removed the Grouper attributes
ax
,groups
,indexer
, andobj
(GH 51206, GH 51182) - Removed deprecated keyword
verbose
on read_csv() and read_table() (GH 56556) - Removed the
method
keyword inExtensionArray.fillna
, implementExtensionArray._pad_or_backfill
instead (GH 53621) - Removed the attribute
dtypes
fromDataFrameGroupBy
(GH 51997) - Enforced deprecation of
argmin
,argmax
,idxmin
, andidxmax
returning a result whenskipna=False
and an NA value is encountered or all values are NA values; these operations will now raise in such cases (GH 33941, GH 51276) - Removed specifying
include_groups=True
inDataFrameGroupBy.apply
andResampler.apply
(GH 7155)
Performance improvements#
- Eliminated circular reference in to original pandas object in accessor attributes (e.g. Series.str). However, accessor instantiation is no longer cached (GH 47667, GH 41357)
- Categorical.categories returns a RangeIndex columns instead of an Index if the constructed
values
was arange
. (GH 57787) - DataFrame returns a RangeIndex columns when possible when
data
is adict
(GH 57943) - Series returns a RangeIndex index when possible when
data
is adict
(GH 58118) - concat() returns a RangeIndex column when possible when
objs
contains Series and DataFrame andaxis=0
(GH 58119) - concat() returns a RangeIndex level in the MultiIndex result when
keys
is arange
or RangeIndex (GH 57542) RangeIndex.append()
returns a RangeIndex instead of a Index when appending values that could continue the RangeIndex (GH 57467)- Series.nlargest() has improved performance when there are duplicate values in the index (GH 55767)
- Series.str.extract() returns a RangeIndex columns instead of an Index column when possible (GH 57542)
- Series.str.partition() with ArrowDtype returns a RangeIndex columns instead of an Index column when possible (GH 57768)
- Performance improvement in DataFrame when
data
is adict
andcolumns
is specified (GH 24368) - Performance improvement in MultiIndex when setting MultiIndex.names doesn’t invalidate all cached operations (GH 59578)
- Performance improvement in DataFrame.join() for sorted but non-unique indexes (GH 56941)
- Performance improvement in DataFrame.join() when left and/or right are non-unique and
how
is"left"
,"right"
, or"inner"
(GH 56817) - Performance improvement in DataFrame.join() with
how="left"
orhow="right"
andsort=True
(GH 56919) - Performance improvement in DataFrame.to_csv() when
index=False
(GH 59312) - Performance improvement in
DataFrameGroupBy.ffill()
,DataFrameGroupBy.bfill()
,SeriesGroupBy.ffill()
, andSeriesGroupBy.bfill()
(GH 56902) - Performance improvement in Index.join() by propagating cached attributes in cases where the result matches one of the inputs (GH 57023)
- Performance improvement in Index.take() when
indices
is a full range indexer from zero to length of index (GH 56806) - Performance improvement in Index.to_frame() returning a RangeIndex columns of a Index when possible. (GH 58018)
- Performance improvement in
MultiIndex._engine()
to use smaller dtypes if possible (GH 58411) - Performance improvement in
MultiIndex.equals()
for equal length indexes (GH 56990) - Performance improvement in
MultiIndex.memory_usage()
to ignore the index engine when it isn’t already cached. (GH 58385) - Performance improvement in
RangeIndex.__getitem__()
with a boolean mask or integers returning a RangeIndex instead of a Index when possible. (GH 57588) - Performance improvement in
RangeIndex.append()
when appending the same index (GH 57252) - Performance improvement in
RangeIndex.argmin()
andRangeIndex.argmax()
(GH 57823) - Performance improvement in
RangeIndex.insert()
returning a RangeIndex instead of a Index when the RangeIndex is empty. (GH 57833) - Performance improvement in
RangeIndex.round()
returning a RangeIndex instead of a Index when possible. (GH 57824) - Performance improvement in
RangeIndex.searchsorted()
(GH 58376) - Performance improvement in
RangeIndex.to_numpy()
when specifying anna_value
(GH 58376) - Performance improvement in
RangeIndex.value_counts()
(GH 58376) - Performance improvement in
RangeIndex.join()
returning a RangeIndex instead of a Index when possible. (GH 57651, GH 57752) - Performance improvement in
RangeIndex.reindex()
returning a RangeIndex instead of a Index when possible. (GH 57647, GH 57752) - Performance improvement in
RangeIndex.take()
returning a RangeIndex instead of a Index when possible. (GH 57445, GH 57752) - Performance improvement in merge() if hash-join can be used (GH 57970)
- Performance improvement in
CategoricalDtype.update_dtype()
whendtype
is a CategoricalDtype with nonNone
categories and ordered (GH 59647) - Performance improvement in
DataFrame.__getitem__()
whenkey
is a DataFrame with many columns (GH 61010) - Performance improvement in DataFrame.astype() when converting to extension floating dtypes, e.g. “Float64” (GH 60066)
- Performance improvement in DataFrame.stack() when using
future_stack=True
and the DataFrame does not have a MultiIndex (GH 58391) - Performance improvement in DataFrame.where() when
cond
is a DataFrame with many columns (GH 61010) - Performance improvement in
to_hdf()
avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (GH 58248) - Performance improvement in
DataFrameGroupBy.__len__
andSeriesGroupBy.__len__
(GH 57595) - Performance improvement in indexing operations for string dtypes (GH 56997)
- Performance improvement in unary methods on a RangeIndex returning a RangeIndex instead of a Index when possible. (GH 57825)
Bug fixes#
Categorical#
- Bug in Series.apply() where
nan
was ignored for CategoricalDtype (GH 59938) - Bug in DataFrame.pivot() and DataFrame.set_index() raising an
ArrowNotImplementedError
for columns with pyarrow dictionary dtype (GH 53051) - Bug in Series.convert_dtypes() with
dtype_backend="pyarrow"
where empty CategoricalDtype Series raised an error or got converted tonull[pyarrow]
(GH 59934)
Datetimelike#
- Bug in
is_year_start
where a DateTimeIndex constructed via a date_range with frequency ‘MS’ wouldn’t have the correct year or quarter start attributes (GH 57377) - Bug in DataFrame raising
ValueError
whendtype
istimedelta64
anddata
is a list containingNone
(GH 60064) - Bug in Timestamp constructor failing to raise when
tz=None
is explicitly specified in conjunction with timezone-awaretzinfo
or data (GH 48688) - Bug in date_range() where the last valid timestamp would sometimes not be produced (GH 56134)
- Bug in date_range() where using a negative frequency value would not include all points between the start and end values (GH 56147)
- Bug in tseries.api.guess_datetime_format() would fail to infer time format when “%Y” == “%H%M” (GH 57452)
- Bug in tseries.frequencies.to_offset() would fail to parse frequency strings starting with “LWOM” (GH 59218)
- Bug in DataFrame.fillna() raising an
AssertionError
instead ofOutOfBoundsDatetime
when filling adatetime64[ns]
column with an out-of-bounds timestamp. Now correctly raisesOutOfBoundsDatetime
. (GH 61208) - Bug in DataFrame.min() and DataFrame.max() casting
datetime64
andtimedelta64
columns tofloat64
and losing precision (GH 60850) - Bug in
Dataframe.agg()
with df with missing values resulting in IndexError (GH 58810) - Bug in DatetimeIndex.is_year_start() and DatetimeIndex.is_quarter_start() does not raise on Custom business days frequencies bigger then “1C” (GH 58664)
- Bug in DatetimeIndex.is_year_start() and DatetimeIndex.is_quarter_start() returning
False
on double-digit frequencies (GH 58523) - Bug in
DatetimeIndex.union()
andDatetimeIndex.intersection()
whenunit
was non-nanosecond (GH 59036) - Bug in Series.dt.microsecond() producing incorrect results for pyarrow backed Series. (GH 59154)
- Bug in to_datetime() not respecting dayfirst if an uncommon date string was passed. (GH 58859)
- Bug in to_datetime() on float array with missing values throwing
FloatingPointError
(GH 58419) - Bug in to_datetime() on float32 df with year, month, day etc. columns leads to precision issues and incorrect result. (GH 60506)
- Bug in to_datetime() reports incorrect index in case of any failure scenario. (GH 58298)
- Bug in to_datetime() with
format="ISO8601"
andutc=True
where naive timestamps incorrectly inherited timezone offset from previous timestamps in a series. (GH 61389) - Bug in to_datetime() wrongly converts when
arg
is anp.datetime64
object with unit ofps
. (GH 60341) - Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond
datetime64
,timedelta64
or DatetimeTZDtype incorrectly truncating those scalars (GH 56410)
Timedelta#
- Accuracy improvement in Timedelta.to_pytimedelta() to round microseconds consistently for large nanosecond based Timedelta (GH 57841)
- Bug in DataFrame.cumsum() which was raising
IndexError
if dtype istimedelta64[ns]
(GH 57956)
Timezones#
Numeric#
- Bug in DataFrame.corr() where numerical precision errors resulted in correlations above
1.0
(GH 61120) - Bug in DataFrame.quantile() where the column type was not preserved when
numeric_only=True
with a list-likeq
produced an empty result (GH 59035) - Bug in Series.dot() returning
object
dtype for ArrowDtype and nullable-dtype data (GH 61375) - Bug in
np.matmul
with Index inputs raising aTypeError
(GH 57079)
Conversion#
- Bug in DataFrame.astype() not casting
values
for Arrow-based dictionary dtype correctly (GH 58479) - Bug in DataFrame.update() bool dtype being converted to object (GH 55509)
- Bug in Series.astype() might modify read-only array inplace when casting to a string dtype (GH 57212)
- Bug in Series.convert_dtypes() and DataFrame.convert_dtypes() removing timezone information for objects with ArrowDtype (GH 60237)
- Bug in Series.reindex() not maintaining
float32
type when areindex
introduces a missing value (GH 45857)
Strings#
- Bug in Series.value_counts() would not respect
sort=False
for series havingstring
dtype (GH 55224)
Interval#
- Index.is_monotonic_decreasing(), Index.is_monotonic_increasing(), and Index.is_unique() could incorrectly be
False
for anIndex
created from a slice of anotherIndex
. (GH 57911) - Bug in interval_range() where start and end numeric types were always cast to 64 bit (GH 57268)
Indexing#
- Bug in
DataFrame.__getitem__()
returning modified columns when called withslice
in Python 3.12 (GH 57500) - Bug in
DataFrame.__getitem__()
when slicing a DataFrame with many rows raised anOverflowError
(GH 59531) - Bug in DataFrame.from_records() throwing a
ValueError
when passed an empty list inindex
(GH 58594) - Bug in DataFrame.loc() with inconsistent behavior of loc-set with 2 given indexes to Series (GH 59933)
- Bug in Index.get_indexer() and similar methods when
NaN
is located at or after position 128 (GH 58924) - Bug in
MultiIndex.insert()
when a new value inserted to a datetime-like level gets cast toNaT
and fails indexing (GH 60388) - Bug in printing Index.names and MultiIndex.levels would not escape single quotes (GH 60190)
- Bug in reindexing of DataFrame with PeriodDtype columns in case of consolidated block (GH 60980, GH 60273)
Missing#
- Bug in DataFrame.fillna() and Series.fillna() that would ignore the
limit
argument on ExtensionArray dtypes (GH 58001)
MultiIndex#
- DataFrame.loc() with
axis=0
and MultiIndex when setting a value adds extra columns (GH 58116) - DataFrame.melt() would not accept multiple names in
var_name
when the columns were a MultiIndex (GH 58033) MultiIndex.insert()
would not insert NA value correctly at unified location of index -1 (GH 59003)- MultiIndex.get_level_values() accessing a DatetimeIndex does not carry the frequency attribute along (GH 58327, GH 57949)
- Bug in DataFrame arithmetic operations in case of unaligned MultiIndex columns (GH 60498)
- Bug in DataFrame arithmetic operations with Series in case of unaligned MultiIndex (GH 61009)
- Bug in MultiIndex.from_tuples() causing wrong output with input of type tuples having NaN values (GH 60695, GH 60988)
I/O#
- Bug in DataFrame and Series
repr
ofcollections.abc.Mapping`
elements. (GH 57915) - Bug in DataFrame.to_json() when
"index"
was a value in theDataFrame.column
and Index.name wasNone
. Now, this will fail with aValueError
(GH 58925) - Bug in
io.common.is_fsspec_url()
not recognizing chained fsspec URLs (GH 48978) - Bug in
DataFrame._repr_html_()
which ignored the"display.float_format"
option (GH 59876) - Bug in DataFrame.from_records() where
columns
parameter with numpy structured array was not reordering and filtering out the columns (GH 59717) - Bug in DataFrame.to_dict() raises unnecessary
UserWarning
when columns are not unique andorient='tight'
. (GH 58281) - Bug in DataFrame.to_excel() when writing empty DataFrame with MultiIndex on both axes (GH 57696)
- Bug in DataFrame.to_excel() where the MultiIndex index with a period level was not a date (GH 60099)
- Bug in DataFrame.to_stata() when exporting a column containing both long strings (Stata strL) and
pd.NA
values (GH 23633) - Bug in DataFrame.to_stata() when writing DataFrame and
byteorder=`big`
. (GH 58969) - Bug in DataFrame.to_stata() when writing more than 32,000 value labels. (GH 60107)
- Bug in DataFrame.to_string() that raised
StopIteration
with nested DataFrames. (GH 16098) - Bug in HDFStore.get() was failing to save data of dtype datetime64[s] correctly (GH 59004)
- Bug in read_csv() causing segmentation fault when
encoding_errors
is not a string. (GH 59059) - Bug in read_csv() raising
TypeError
whenindex_col
is specified andna_values
is a dict containing the keyNone
. (GH 57547) - Bug in read_csv() raising
TypeError
whennrows
anditerator
are specified without specifying achunksize
. (GH 59079) - Bug in read_csv() where the order of the
na_values
makes an inconsistency whenna_values
is a list non-string values. (GH 59303) - Bug in read_excel() raising
ValueError
when passing array of boolean values whendtype="boolean"
. (GH 58159) - Bug in read_html() where
rowspan
in header row causes incorrect conversion toDataFrame
. (GH 60210) - Bug in read_json() ignoring the given
dtype
whenengine="pyarrow"
(GH 59516) - Bug in read_json() not validating the
typ
argument to not be exactly"frame"
or"series"
(GH 59124) - Bug in read_json() where extreme value integers in string format were incorrectly parsed as a different integer number (GH 20608)
- Bug in read_stata() raising
KeyError
when input file is stored in big-endian format and contains strL data. (GH 58638) - Bug in read_stata() where extreme value integers were incorrectly interpreted as missing for format versions 111 and prior (GH 58130)
- Bug in read_stata() where the missing code for double was not recognised for format versions 105 and prior (GH 58149)
- Bug in set_option() where setting the pandas option
display.html.use_mathjax
toFalse
has no effect (GH 59884) - Bug in
to_excel()
where MultiIndex columns would be merged to a single row whenmerge_cells=False
is passed (GH 60274)
Period#
- Fixed error message when passing invalid period alias to PeriodIndex.to_timestamp() (GH 58974)
Plotting#
- Bug in DataFrameGroupBy.boxplot() failed when there were multiple groupings (GH 14701)
- Bug in DataFrame.plot.bar() when
subplots
andstacked=True
are used in conjunction which causes incorrect stacking. (GH 61018) - Bug in DataFrame.plot.bar() with
stacked=True
where labels on stacked bars with zero-height segments were incorrectly positioned at the base instead of the label position of the previous segment (GH 59429) - Bug in DataFrame.plot.line() raising
ValueError
when set both color and adict
style (GH 59461) - Bug in DataFrame.plot() that causes a shift to the right when the frequency multiplier is greater than one. (GH 57587)
- Bug in DataFrame.plot() where
title
would require extra titles when plotting more than one column per subplot. (GH 61019) - Bug in Series.plot() preventing a line and bar from being aligned on the same plot (GH 61161)
- Bug in Series.plot() preventing a line and scatter plot from being aligned (GH 61005)
- Bug in Series.plot() with
kind="pie"
with ArrowDtype (GH 59192)
Groupby/resample/rolling#
- Bug in
DataFrameGroupBy.__len__()
andSeriesGroupBy.__len__()
would raise when the grouping contained NA values anddropna=False
(GH 58644) - Bug in DataFrameGroupBy.any() that returned True for groups where all Timedelta values are NaT. (GH 59712)
- Bug in DataFrameGroupBy.groups() and SeriesGroupBy.groups() would fail when the groups were Categorical with an NA value (GH 61356)
- Bug in DataFrameGroupBy.groups() and
SeriesGroupby.groups()
that would not respect groupby argumentdropna
(GH 55919) - Bug in DataFrameGroupBy.median() where nat values gave an incorrect result. (GH 57926)
- Bug in DataFrameGroupBy.quantile() when
interpolation="nearest"
is inconsistent with DataFrame.quantile() (GH 47942) - Bug in Resampler.interpolate() on a DataFrame with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (GH 21351)
- Bug in Series.rolling() when used with a BaseIndexer subclass and computing min/max (GH 46726)
- Bug in DataFrame.ewm() and Series.ewm() when passed
times
and aggregation functions other than mean (GH 51695) - Bug in DataFrame.resample() and Series.resample() were not keeping the index name when the index had ArrowDtype timestamp dtype (GH 61222)
- Bug in DataFrame.resample() changing index type to MultiIndex when the dataframe is empty and using an upsample method (GH 55572)
- Bug in
DataFrameGroupBy.agg()
that raisesAttributeError
when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (GH 55041) - Bug in
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
for empty data frame withgroup_keys=False
still creating output index using group keys. (GH 60471) - Bug in
DataFrameGroupBy.apply()
that was returning a completely empty DataFrame when all return values offunc
wereNone
instead of returning an empty DataFrame with the original columns and dtypes. (GH 57775) - Bug in
DataFrameGroupBy.apply()
withas_index=False
that was returning MultiIndex instead of returning Index. (GH 58291) - Bug in
DataFrameGroupBy.cumsum()
andDataFrameGroupBy.cumprod()
wherenumeric_only
parameter was passed indirectly through kwargs instead of passing directly. (GH 58811) - Bug in
DataFrameGroupBy.cumsum()
where it did not return the correct dtype when the label containedNone
. (GH 58811) - Bug in
DataFrameGroupby.transform()
andSeriesGroupby.transform()
with a reducer andobserved=False
that coerces dtype to float when there are unobserved categories. (GH 55326) - Bug in
Rolling.apply()
formethod="table"
where column order was not being respected due to the columns getting sorted by default. (GH 59666) - Bug in
Rolling.apply()
where the applied function could be called on fewer thanmin_period
periods ifmethod="table"
. (GH 58868) - Bug in Series.resample() could raise when the the date range ended shortly before a non-existent time. (GH 58380)
Reshaping#
- Bug in qcut() where values at the quantile boundaries could be incorrectly assigned (GH 59355)
- Bug in DataFrame.combine_first() not preserving the column order (GH 60427)
- Bug in DataFrame.explode() producing incorrect result for
pyarrow.large_list
type (GH 61091) - Bug in DataFrame.join() inconsistently setting result index name (GH 55815)
- Bug in DataFrame.join() when a DataFrame with a MultiIndex would raise an
AssertionError
when MultiIndex.names containedNone
. (GH 58721) - Bug in DataFrame.merge() where merging on a column containing only
NaN
values resulted in an out-of-bounds array access (GH 59421) - Bug in DataFrame.unstack() producing incorrect results when
sort=False
(GH 54987, GH 55516) - Bug in DataFrame.merge() when merging two DataFrame on
intc
oruintc
types on Windows (GH 60091, GH 58713) - Bug in DataFrame.pivot_table() incorrectly subaggregating results when called without an
index
argument (GH 58722) - Bug in DataFrame.pivot_table() incorrectly ignoring the
values
argument when also supplied to theindex
orcolumns
parameters (GH 57876, GH 61292) - Bug in DataFrame.stack() with the new implementation where
ValueError
is raised whenlevel=[]
(GH 60740) - Bug in DataFrame.unstack() producing incorrect results when manipulating empty DataFrame with an
ExtentionDtype
(GH 59123) - Bug in concat() where concatenating DataFrame and Series with
ignore_index = True
drops the series name (GH 60723, GH 56257)
Sparse#
- Bug in SparseDtype for equal comparison with na fill value. (GH 54770)
- Bug in DataFrame.sparse.from_spmatrix() which hard coded an invalid
fill_value
for certain subtypes. (GH 59063) - Bug in DataFrame.sparse.to_dense() which ignored subclassing and always returned an instance of DataFrame (GH 59913)
ExtensionArray#
- Bug in Categorical when constructing with an Index with ArrowDtype (GH 60563)
- Bug in
arrays.ArrowExtensionArray.__setitem__()
which caused wrong behavior when using an integer array with repeated values as a key (GH 58530) - Bug in
ArrowExtensionArray.factorize()
where NA values were dropped when input was dictionary-encoded even when dropna was set to False(GH 60567) - Bug in api.types.is_datetime64_any_dtype() where a custom
ExtensionDtype
would returnFalse
for array-likes (GH 57055) - Bug in comparison between object with ArrowDtype and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-
False
(for==
) or all-True
(for!=
) (GH 59505) - Bug in constructing pandas data structures when passing into
dtype
a string of the type followed by[pyarrow]
while PyArrow is not installed would raiseNameError
rather thanImportError
(GH 57928) - Bug in various DataFrame reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (GH 59234)
Styler#
- Bug in
Styler.to_latex()
where styling column headers when combined with a hidden index or hidden index-levels is fixed.
Other#
- Bug in DataFrame when passing a
dict
with a NA scalar andcolumns
that would always returnnp.nan
(GH 57205) - Bug in Series ignoring errors when trying to convert Series input data to the given
dtype
(GH 60728) - Bug in eval() on
ExtensionArray
on including division/
failed with aTypeError
. (GH 58748) - Bug in eval() where method calls on binary operations like
(x + y).dropna()
would raiseAttributeError: 'BinOp' object has no attribute 'value'
(GH 61175) - Bug in eval() where the names of the Series were not preserved when using
engine="numexpr"
. (GH 10239) - Bug in eval() with
engine="numexpr"
returning unexpected result for float division. (GH 59736) - Bug in to_numeric() raising
TypeError
whenarg
is a Timedelta or Timestamp scalar. (GH 59944) - Bug in unique() on Index not always returning Index (GH 57043)
- Bug in DataFrame.apply() where passing
engine="numba"
ignoredargs
passed to the applied function (GH 58712) - Bug in DataFrame.eval() and DataFrame.query() which caused an exception when using NumPy attributes via
@
notation, e.g.,df.eval("@np.floor(a)")
. (GH 58041) - Bug in DataFrame.eval() and DataFrame.query() which did not allow to use
tan
function. (GH 55091) - Bug in DataFrame.query() where using duplicate column names led to a
TypeError
. (GH 59950) - Bug in DataFrame.query() which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character
#
, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (GH 59285) (GH 49633) - Bug in DataFrame.query() which raised an exception when querying integer column names using backticks. (GH 60494)
- Bug in DataFrame.shift() where passing a
freq
on a DataFrame with no columns did not shift the index correctly. (GH 60102) - Bug in DataFrame.sort_index() when passing
axis="columns"
andignore_index=True
andascending=False
not returning a RangeIndex columns (GH 57293) - Bug in DataFrame.transform() that was returning the wrong order unless the index was monotonically increasing. (GH 57069)
- Bug in DataFrame.where() where using a non-bool type array in the function would return a
ValueError
instead of aTypeError
(GH 56330) - Bug in Index.sort_values() when passing a key function that turns values into tuples, e.g.
key=natsort.natsort_key
, would raiseTypeError
(GH 56081) - Bug in
MultiIndex.fillna()
error message was referring toisna
instead offillna
(GH 60974) - Bug in Series.describe() where median percentile was always included when the
percentiles
argument was passed (GH 60550). - Bug in Series.diff() allowing non-integer values for the
periods
argument. (GH 56607) - Bug in Series.dt() methods in ArrowDtype that were returning incorrect values. (GH 57355)
- Bug in Series.isin() raising
TypeError
when series is large (>10**6) andvalues
contains NA (GH 60678) - Bug in Series.mode() where an exception was raised when taking the mode with nullable types with no null values in the series. (GH 58926)
- Bug in Series.rank() that doesn’t preserve missing values for nullable integers when
na_option='keep'
. (GH 56976) - Bug in Series.replace() and DataFrame.replace() inconsistently replacing matching instances when
regex=True
and missing values are present. (GH 56599) - Bug in Series.replace() and DataFrame.replace() throwing
ValueError
whenregex=True
and all NA values. (GH 60688) - Bug in Series.to_string() when series contains complex floats with exponents (GH 60405)
- Bug in read_csv() where chained fsspec TAR file and
compression="infer"
fails withtarfile.ReadError
(GH 60028) - Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers’ associated dtype, for string and datetime columns (GH 54781)
- Bug in
Series.list
methods not preserving the original Index. (GH 58425) - Bug in
Series.list
methods not preserving the original name. (GH 60522) - Bug in printing a DataFrame with a DataFrame stored in DataFrame.attrs raised a
ValueError
(GH 60455) - Bug in printing a Series with a DataFrame stored in Series.attrs raised a
ValueError
(GH 60568) - Fixed regression in DataFrame.from_records() not initializing subclasses properly (GH 57008)