BUG: Series.update() raises ValueError if dtype="string" · Issue #33980 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd a = pd.Series(["a", None, "c"], dtype="string") b = pd.Series([None, "b", None], dtype="string") a.update(b)
results in:
Traceback (most recent call last):
File "", line 1, in a.update(b)
File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\series.py", line 2810, in update self._data = self._data.putmask(mask=mask, new=other, inplace=True)
File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\internals\managers.py", line 564, in putmask return self.apply("putmask", **kwargs)
File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\internals\managers.py", line 442, in apply applied = getattr(b, f)(**kwargs)
File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\internals\blocks.py", line 1676, in putmask new_values[mask] = new
File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\arrays\string_.py", line 248, in setitem super().setitem(key, value)
File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\arrays\numpy_.py", line 252, in setitem self._ndarray[key] = value
ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 1 output values where the mask is true
Problem description
The example works if I leave off the dtype="string"
(resulting in the implicit dtype object
).
IMO update should work for all dtypes, not only the "old" ones.
a = pd.Series([1, None, 3], dtype="Int16")
etc. also raises ValueError, while the same with dtype="float64"
works.
It looks as if update doesn't work with the new nullable dtypes (the ones with pd.NA
).
Expected Output
The expected result is that a.update(b)
updates a
without raising an exception, not only for object
and float64
, but also for string
and Int16
etc..
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 ..., GenuineIntel
...
pandas : 1.0.3
numpy : 1.18.1
...