BUG: Regression in 1.1.0. "invalid slicing for a 1-ndim ExtensionArray" · Issue #37631 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
s1 = pd.Series(list("abc")).astype("category").iloc[[0]] s2 = pickle.loads(pickle.dumps(s1)) print(s1.dropna()) print(s2.dropna())
0 a dtype: category Categories (3, object): ['a', 'b', 'c']
AssertionError Traceback (most recent call last) in 6 s2 = pickle.loads(pickle.dumps(s1)) 7 print(s1.dropna()) ----> 8 print(s2.dropna())
~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in dropna(self, axis, inplace, how) 4883 4884 if self._can_hold_na: -> 4885 result = remove_na_arraylike(self) 4886 if inplace: 4887 self._update_inplace(result)
~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in remove_na_arraylike(arr) 564 """ 565 if is_extension_array_dtype(arr): --> 566 return arr[notna(arr)] 567 else: 568 return arr[notna(np.asarray(arr))]
~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in getitem(self, key) 904 key = check_bool_indexer(self.index, key) 905 key = np.asarray(key, dtype=bool) --> 906 return self._get_values(key) 907 908 return self._get_with(key)
~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/series.py in _get_values(self, indexer) 966 def _get_values(self, indexer): 967 try: --> 968 return self._constructor(self._mgr.get_slice(indexer)).finalize(self,) 969 except ValueError: 970 # mpl compat if we look up e.g. ser[:, np.newaxis];
~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis) 1559 1560 blk = self._block -> 1561 array = blk._slice(slobj) 1562 block = blk.make_block_same_class(array, placement=slice(0, len(array))) 1563 return type(self)(block, self.index[slobj])
~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer) 1756 if not isinstance(first, slice): 1757 raise AssertionError( -> 1758 "invalid slicing for a 1-ndim ExtensionArray", first 1759 ) 1760 # GH#32959 only full-slicers along fake-dim0 are valid
AssertionError: ('invalid slicing for a 1-ndim ExtensionArray', array([ True]))
Problem description
Not sure what changes in the serialization roundtrip through pickle, but it seems the copied Series cannot be indexed with a Boolean slice anymore, tripping up dropna()
as a result. The following code more directly exposes the error:
s1 = pd.Series(list("abc")).astype("category").iloc[[0]] s2 = pickle.loads(pickle.dumps(s1)) s2[[True]]
The error happens e.g. when processing multiple Series in parallel (triggering serialization with pickle), and when a categorical Series has been filtered down to a single row. With another dtype, or more than one row, this error doesn't get triggered.
The regression must been introduced in version 1.1.0, as in 1.0.5 the above code works as expected.
Expected Output
Behaviour of slicing, dropna
etc. should be same before and after pickling a Series, and independent of the number of rows.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 67a3d42
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.1.4
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 46.1.1.post20200322
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.46.0