What’s new in 1.2.1 (January 20, 2021) — pandas 2.2.3 documentation (original) (raw)
These are the changes in pandas 1.2.1. See Release notes for a full changelog including other versions of pandas.
Fixed regressions#
- Fixed regression in to_csv() that created corrupted zip files when there were more rows than
chunksize
(GH 38714) - Fixed regression in to_csv() opening
codecs.StreamReaderWriter
in binary mode instead of in text mode (GH 39247) - Fixed regression in read_csv() and other read functions were the encoding error policy (
errors
) did not default to"replace"
when no encoding was specified (GH 38989) - Fixed regression in read_excel() with non-rawbyte file handles (GH 38788)
- Fixed regression in DataFrame.to_stata() not removing the created file when an error occurred (GH 39202)
- Fixed regression in
DataFrame.__setitem__
raisingValueError
when expanding DataFrame and new column is from type"0 - name"
(GH 39010) - Fixed regression in setting with DataFrame.loc() raising
ValueError
when DataFrame has unsorted MultiIndex columns and indexer is a scalar (GH 38601) - Fixed regression in setting with DataFrame.loc() raising
KeyError
with MultiIndex and list-like columns indexer enlarging DataFrame (GH 39147) - Fixed regression in groupby() with Categorical grouping column not showing unused categories for
grouped.indices
(GH 38642) - Fixed regression in DataFrameGroupBy.sem() and SeriesGroupBy.sem() where the presence of non-numeric columns would cause an error instead of being dropped (GH 38774)
- Fixed regression in DataFrameGroupBy.diff() raising for
int8
andint16
columns (GH 39050) - Fixed regression in DataFrame.groupby() when aggregating an
ExtensionDType
that could fail for non-numeric values (GH 38980) - Fixed regression in Rolling.skew() and Rolling.kurt() modifying the object inplace (GH 38908)
- Fixed regression in DataFrame.any() and DataFrame.all() not returning a result for tz-aware
datetime64
columns (GH 38723) - Fixed regression in DataFrame.apply() with
axis=1
using str accessor in apply function (GH 38979) - Fixed regression in DataFrame.replace() raising
ValueError
when DataFrame has dtypebytes
(GH 38900) - Fixed regression in Series.fillna() that raised
RecursionError
withdatetime64[ns, UTC]
dtype (GH 38851) - Fixed regression in comparisons between
NaT
anddatetime.date
objects incorrectly returningTrue
(GH 39151) - Fixed regression in calling NumPy
accumulate()
ufuncs on DataFrames, e.g.np.maximum.accumulate(df)
(GH 39259) - Fixed regression in repr of float-like strings of an
object
dtype having trailing 0’s truncated after the decimal (GH 38708) - Fixed regression that raised
AttributeError
with PyArrow versions [0.16.0, 1.0.0) (GH 38801) - Fixed regression in pandas.testing.assert_frame_equal() raising
TypeError
withcheck_like=True
when Index or columns have mixed dtype (GH 39168)
We have reverted a commit that resulted in several plotting related regressions in pandas 1.2.0 (GH 38969, GH 38736, GH 38865, GH 38947 and GH 39126). As a result, bugs reported as fixed in pandas 1.2.0 related to inconsistent tick labeling in bar plots are again present (GH 26186 and GH 11465)
Calling NumPy ufuncs on non-aligned DataFrames#
Before pandas 1.2.0, calling a NumPy ufunc on non-aligned DataFrames (or DataFrame / Series combination) would ignore the indices, only match the inputs by shape, and use the index/columns of the first DataFrame for the result:
In [1]: df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1]) In [2]: df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2]) In [3]: df1 Out[3]: a b 0 1 3 1 2 4 In [4]: df2 Out[4]: a b 1 1 3 2 2 4
In [5]: np.add(df1, df2) Out[5]: a b 0 2 6 1 4 8
This contrasts with how other pandas operations work, which first align the inputs:
In [6]: df1 + df2 Out[6]: a b 0 NaN NaN 1 3.0 7.0 2 NaN NaN
In pandas 1.2.0, we refactored how NumPy ufuncs are called on DataFrames, and this started to align the inputs first (GH 39184), as happens in other pandas operations and as it happens for ufuncs called on Series objects.
For pandas 1.2.1, we restored the previous behaviour to avoid a breaking change, but the above example of np.add(df1, df2)
with non-aligned inputs will now to raise a warning, and a future pandas 2.0 release will start aligning the inputs first (GH 39184). Calling a NumPy ufunc on Series objects (eg np.add(s1, s2)
) already aligns and continues to do so.
To avoid the warning and keep the current behaviour of ignoring the indices, convert one of the arguments to a NumPy array:
In [7]: np.add(df1, np.asarray(df2)) Out[7]: a b 0 2 6 1 4 8
To obtain the future behaviour and silence the warning, you can align manually before passing the arguments to the ufunc:
In [8]: df1, df2 = df1.align(df2) In [9]: np.add(df1, df2) Out[9]: a b 0 NaN NaN 1 3.0 7.0 2 NaN NaN
Bug fixes#
- Bug in read_csv() with
float_precision="high"
caused segfault or wrong parsing of long exponent strings. This resulted in a regression in some cases as the default forfloat_precision
was changed in pandas 1.2.0 (GH 38753) - Bug in read_csv() not closing an opened file handle when a
csv.Error
orUnicodeDecodeError
occurred while initializing (GH 39024) - Bug in pandas.testing.assert_index_equal() raising
TypeError
withcheck_order=False
when Index has mixed dtype (GH 39168)
Other#
- The deprecated attributes
_AXIS_NAMES
and_AXIS_NUMBERS
of DataFrame and Series will no longer show up indir
orinspect.getmembers
calls (GH 38740) - Bumped minimum fastparquet version to 0.4.0 to avoid
AttributeError
from numba (GH 38344) - Bumped minimum pymysql version to 0.8.1 to avoid test failures (GH 38344)
- Fixed build failure on MacOS 11 in Python 3.9.1 (GH 38766)
- Added reference to backwards incompatible
check_freq
arg of testing.assert_frame_equal() and testing.assert_series_equal() in pandas 1.1.0 what’s new (GH 34050)
Contributors#
A total of 20 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Ada Draginda +
- Andrew Wieteska
- Bryan Cutler
- Fangchen Li
- Joris Van den Bossche
- Matthew Roeschke
- Matthew Zeitlin +
- MeeseeksMachine
- Micael Jarniac
- Omar Afifi +
- Pandas Development Team
- Richard Shadrach
- Simon Hawkins
- Terji Petersen
- Torsten Wörtwein
- WANG Aiyong
- jbrockmendel
- kylekeppler
- mzeitlin11
- patrick