What’s new in 1.0.2 (March 12, 2020) — pandas 2.2.3 documentation (original) (raw)

These are the changes in pandas 1.0.2. See Release notes for a full changelog including other versions of pandas.

Fixed regressions#

Groupby

Fixed regression in DataFrameGroupBy.agg() and SeriesGroupBy.agg() which were failing on frames with MultiIndex columns and a custom function (GH 31777)
Fixed regression in groupby(..).rolling(..).apply() (RollingGroupby) where the raw parameter was ignored (GH 31754)
Fixed regression in rolling(..).corr() when using a time offset (GH 31789)
Fixed regression in groupby(..).nunique() which was modifying the original values if NaN values were present (GH 31950)
Fixed regression in DataFrame.groupby raising a ValueError from an internal operation (GH 31802)
Fixed regression in DataFrameGroupBy.agg() and SeriesGroupBy.agg() calling a user-provided function an extra time on an empty input (GH 31760)

I/O

Fixed regression in read_csv() in which the encoding option was not recognized with certain file-like objects (GH 31819)
Fixed regression in DataFrame.to_excel() when the columns keyword argument is passed (GH 31677)
Fixed regression in ExcelFile where the stream passed into the function was closed by the destructor. (GH 31467)
Fixed regression where read_pickle() raised a UnicodeDecodeError when reading a py27 pickle with MultiIndex column (GH 31988).

Reindexing/alignment

Fixed regression in Series.align() when other is a DataFrame and method is not None (GH 31785)
Fixed regression in DataFrame.reindex() and Series.reindex() when reindexing with (tz-aware) index and method=nearest (GH 26683)
Fixed regression in DataFrame.reindex_like() on a DataFrame subclass raised an AssertionError (GH 31925)
Fixed regression in DataFrame arithmetic operations with mis-matched columns (GH 31623)

Other

Fixed regression in joining on DatetimeIndex or TimedeltaIndex to preserve freq in simple cases (GH 32166)
Fixed regression in Series.shift() with datetime64 dtype when passing an integer fill_value (GH 32591)
Fixed regression in the repr of an object-dtype Index with bools and missing values (GH 32146)

Indexing with nullable boolean arrays#

Previously indexing with a nullable Boolean array containing NA would raise a ValueError, however this is now permitted with NA being treated as False. (GH 31503)

In [1]: s = pd.Series([1, 2, 3, 4])

In [2]: mask = pd.array([True, True, False, None], dtype="boolean")

In [3]: s Out[3]: 0 1 1 2 2 3 3 4 Length: 4, dtype: int64

In [4]: mask Out[4]: [True, True, False, ] Length: 4, dtype: boolean

pandas 1.0.0-1.0.1

s[mask] Traceback (most recent call last): ... ValueError: cannot mask with array containing NA / NaN values

pandas 1.0.2

In [5]: s[mask] Out[5]: 0 1 1 2 Length: 2, dtype: int64

Bug fixes#

Datetimelike

Bug in Series.astype() not copying for tz-naive and tz-aware datetime64 dtype (GH 32490)
Bug where to_datetime() would raise when passed pd.NA (GH 32213)
Improved error message when subtracting two Timestamp that result in an out-of-bounds Timedelta (GH 31774)

Categorical

Fixed bug where Categorical.from_codes() improperly raised a ValueError when passed nullable integer codes. (GH 31779)
Fixed bug where Categorical() constructor would raise a TypeError when given a numpy array containing pd.NA. (GH 31927)
Bug in Categorical that would ignore or crash when calling Series.replace() with a list-like to_replace (GH 31720)

I/O

Using pd.NA with DataFrame.to_json() now correctly outputs a null value instead of an empty object (GH 31615)
Bug in pandas.json_normalize() when value in meta path is not iterable (GH 31507)
Fixed pickling of pandas.NA. Previously a new object was returned, which broke computations relying on NA being a singleton (GH 31847)
Fixed bug in parquet roundtrip with nullable unsigned integer dtypes (GH 31896).

Experimental dtypes

Fixed bug in DataFrame.convert_dtypes() for columns that were already using the "string" dtype (GH 31731).
Fixed bug in DataFrame.convert_dtypes() for series with mix of integers and strings (GH 32117)
Fixed bug in DataFrame.convert_dtypes() where BooleanDtype columns were converted to Int64 (GH 32287)
Fixed bug in setting values using a slice indexer with string dtype (GH 31772)
Fixed bug where DataFrameGroupBy.first(), SeriesGroupBy.first(), DataFrameGroupBy.last(), and SeriesGroupBy.last() would raise a TypeError when groups contained pd.NA in a column of object dtype (GH 32123)
Fixed bug where DataFrameGroupBy.mean(), DataFrameGroupBy.median(), DataFrameGroupBy.var(), and DataFrameGroupBy.std() would raise a TypeError on Int64 dtype columns (GH 32219)

Strings

Using pd.NA with Series.str.repeat() now correctly outputs a null value instead of raising error for vector inputs (GH 31632)

Rolling

Fixed rolling operations with variable window (defined by time duration) on decreasing time index (GH 32385).

Contributors#

A total of 25 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

Anna Daglis +
Daniel Saxton
Irv Lustig
Jan Škoda
Joris Van den Bossche
Justin Zheng
Kaiqi Dong
Kendall Masse
Marco Gorelli
Matthew Roeschke
MeeseeksMachine
MomIsBestFriend
Pandas Development Team
Pedro Reys +
Prakhar Pandey
Robert de Vries +
Rushabh Vasani
Simon Hawkins
Stijn Van Hoey
Terji Petersen
Tom Augspurger
William Ayd
alimcmaster1
gfyoung
jbrockmendel