Version 0.22.0 (December 29, 2017) — pandas 2.2.3 documentation (original) (raw)
This is a major release from 0.21.1 and includes a single, API-breaking change. We recommend that all users upgrade to this version after carefully reading the release note (singular!).
Backwards incompatible API changes#
pandas 0.22.0 changes the handling of empty and all-NA sums and products. The summary is that
- The sum of an empty or all-NA
Series
is now0
- The product of an empty or all-NA
Series
is now1
- We’ve added a
min_count
parameter to.sum()
and.prod()
controlling the minimum number of valid values for the result to be valid. If fewer thanmin_count
non-NA values are present, the result is NA. The default is0
. To returnNaN
, the 0.21 behavior, usemin_count=1
.
Some background: In pandas 0.21, we fixed a long-standing inconsistency in the return value of all-NA series depending on whether or not bottleneck was installed. See Sum/prod of all-NaN or empty Series/DataFrames is now consistently NaN. At the same time, we changed the sum and prod of an empty Series
to also be NaN
.
Based on feedback, we’ve partially reverted those changes.
Arithmetic operations#
The default sum for empty or all-NA Series
is now 0
.
pandas 0.21.x
In [1]: pd.Series([]).sum() Out[1]: nan
In [2]: pd.Series([np.nan]).sum() Out[2]: nan
pandas 0.22.0
In [1]: pd.Series([]).sum() Out[1]: 0
In [2]: pd.Series([np.nan]).sum() Out[2]: 0.0
The default behavior is the same as pandas 0.20.3 with bottleneck installed. It also matches the behavior of NumPy’s np.nansum
on empty and all-NA arrays.
To have the sum of an empty series return NaN
(the default behavior of pandas 0.20.3 without bottleneck, or pandas 0.21.x), use the min_count
keyword.
In [3]: pd.Series([]).sum(min_count=1) Out[3]: nan
Thanks to the skipna
parameter, the .sum
on an all-_NA_series is conceptually the same as the .sum
of an empty one withskipna=True
(the default).
In [4]: pd.Series([np.nan]).sum(min_count=1) # skipna=True by default Out[4]: nan
The min_count
parameter refers to the minimum number of non-null values required for a non-NA sum or product.
Series.prod() has been updated to behave the same as Series.sum(), returning 1
instead.
In [5]: pd.Series([]).prod() Out[5]: 1
In [6]: pd.Series([np.nan]).prod() Out[6]: 1.0
In [7]: pd.Series([]).prod(min_count=1) Out[7]: nan
These changes affect DataFrame.sum() and DataFrame.prod() as well. Finally, a few less obvious places in pandas are affected by this change.
Grouping by a Categorical#
Grouping by a Categorical
and summing now returns 0
instead ofNaN
for categories with no observations. The product now returns 1
instead of NaN
.
pandas 0.21.x
In [8]: grouper = pd.Categorical(['a', 'a'], categories=['a', 'b'])
In [9]: pd.Series([1, 2]).groupby(grouper, observed=False).sum() Out[9]: a 3.0 b NaN dtype: float64
pandas 0.22
In [8]: grouper = pd.Categorical(["a", "a"], categories=["a", "b"])
In [9]: pd.Series([1, 2]).groupby(grouper, observed=False).sum() Out[9]: a 3 b 0 Length: 2, dtype: int64
To restore the 0.21 behavior of returning NaN
for unobserved groups, use min_count>=1
.
In [10]: pd.Series([1, 2]).groupby(grouper, observed=False).sum(min_count=1) Out[10]: a 3.0 b NaN Length: 2, dtype: float64
Resample#
The sum and product of all-NA bins has changed from NaN
to 0
for sum and 1
for product.
pandas 0.21.x
In [11]: s = pd.Series([1, 1, np.nan, np.nan], ....: index=pd.date_range('2017', periods=4)) ....: s Out[11]: 2017-01-01 1.0 2017-01-02 1.0 2017-01-03 NaN 2017-01-04 NaN Freq: D, dtype: float64
In [12]: s.resample('2d').sum() Out[12]: 2017-01-01 2.0 2017-01-03 NaN Freq: 2D, dtype: float64
pandas 0.22.0
In [11]: s = pd.Series([1, 1, np.nan, np.nan], index=pd.date_range("2017", periods=4))
In [12]: s.resample("2d").sum() Out[12]: 2017-01-01 2.0 2017-01-03 0.0 Freq: 2D, Length: 2, dtype: float64
To restore the 0.21 behavior of returning NaN
, use min_count>=1
.
In [13]: s.resample("2d").sum(min_count=1) Out[13]: 2017-01-01 2.0 2017-01-03 NaN Freq: 2D, Length: 2, dtype: float64
In particular, upsampling and taking the sum or product is affected, as upsampling introduces missing values even if the original series was entirely valid.
pandas 0.21.x
In [14]: idx = pd.DatetimeIndex(['2017-01-01', '2017-01-02'])
In [15]: pd.Series([1, 2], index=idx).resample('12H').sum() Out[15]: 2017-01-01 00:00:00 1.0 2017-01-01 12:00:00 NaN 2017-01-02 00:00:00 2.0 Freq: 12H, dtype: float64
pandas 0.22.0
In [14]: idx = pd.DatetimeIndex(["2017-01-01", "2017-01-02"]) In [15]: pd.Series([1, 2], index=idx).resample("12H").sum() Out[15]: 2017-01-01 00:00:00 1 2017-01-01 12:00:00 0 2017-01-02 00:00:00 2 Freq: 12H, Length: 3, dtype: int64
Once again, the min_count
keyword is available to restore the 0.21 behavior.
In [16]: pd.Series([1, 2], index=idx).resample("12H").sum(min_count=1) Out[16]: 2017-01-01 00:00:00 1.0 2017-01-01 12:00:00 NaN 2017-01-02 00:00:00 2.0 Freq: 12H, Length: 3, dtype: float64
Rolling and expanding#
Rolling and expanding already have a min_periods
keyword that behaves similar to min_count
. The only case that changes is when doing a rolling or expanding sum with min_periods=0
. Previously this returned NaN
, when fewer than min_periods
non-NA values were in the window. Now it returns 0
.
pandas 0.21.1
In [17]: s = pd.Series([np.nan, np.nan])
In [18]: s.rolling(2, min_periods=0).sum() Out[18]: 0 NaN 1 NaN dtype: float64
pandas 0.22.0
In [14]: s = pd.Series([np.nan, np.nan])
In [15]: s.rolling(2, min_periods=0).sum() Out[15]: 0 0.0 1 0.0 Length: 2, dtype: float64
The default behavior of min_periods=None
, implying that min_periods
equals the window size, is unchanged.
Compatibility#
If you maintain a library that should work across pandas versions, it may be easiest to exclude pandas 0.21 from your requirements. Otherwise, all yoursum()
calls would need to check if the Series
is empty before summing.
With setuptools, in your setup.py
use:
install_requires=['pandas!=0.21.*', ...]
With conda, use
requirements: run: - pandas !=0.21.0,!=0.21.1
Note that the inconsistency in the return value for all-NA series is still there for pandas 0.20.3 and earlier. Avoiding pandas 0.21 will only help with the empty case.
Contributors#
A total of 1 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Tom Augspurger