BUG: sparse.to_coo()
with categorical multiindex gives iterators.c:185: bad argument to internal function
· Issue #50996 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
nodes=pd.CategoricalIndex(list('abcde'), name='nodes') edges=pd.CategoricalIndex([0,1,2,3], name='edges') flags=pd.MultiIndex.from_product((edges,nodes))
levi= pd.Series(1, index=flags, dtype='Sparse[int]').sample(10) levi.sparse.to_coo(row_levels=['edges'], column_levels=['nodes'])
Issue Description
example levi
:
nodes edges
d 2 1
a 1 1
c 1 1
d 0 1
c 2 1
0 1
e 1 1
b 1 1
c 3 1
b 3 1
dtype: Sparse[int64, 0]
Error stacktrace:
SystemError Traceback (most recent call last) Cell In[44], line 1 ----> 1 levi.sparse.to_coo(row_levels=['edges'], column_levels=['nodes'])
File ~/.pyenv/versions/grabble/lib/python3.10/site-packages/pandas/core/arrays/sparse/accessor.py:187, in SparseAccessor.to_coo(self, row_levels, column_levels, sort_labels) 111 """ 112 Create a scipy.sparse.coo_matrix from a Series with MultiIndex. 113 (...) 183 [('a', 0), ('a', 1), ('b', 0), ('b', 1)] 184 """ 185 from pandas.core.arrays.sparse.scipy_sparse import sparse_series_to_coo --> 187 A, rows, columns = sparse_series_to_coo( 188 self._parent, row_levels, column_levels, sort_labels=sort_labels 189 ) 190 return A, rows, columns
File ~/.pyenv/versions/grabble/lib/python3.10/site-packages/pandas/core/arrays/sparse/scipy_sparse.py:167, in sparse_series_to_coo(ss, row_levels, column_levels, sort_labels) 164 row_levels = [ss.index._get_level_number(x) for x in row_levels] 165 column_levels = [ss.index._get_level_number(x) for x in column_levels] --> 167 v, i, j, rows, columns = _to_ijv( 168 ss, row_levels=row_levels, column_levels=column_levels, sort_labels=sort_labels 169 ) 170 sparse_matrix = scipy.sparse.coo_matrix( 171 (v, (i, j)), shape=(len(rows), len(columns)) 172 ) 173 return sparse_matrix, rows, columns
File ~/.pyenv/versions/grabble/lib/python3.10/site-packages/pandas/core/arrays/sparse/scipy_sparse.py:132, in _to_ijv(ss, row_levels, column_levels, sort_labels) 129 values = sp_vals[na_mask] 130 valid_ilocs = ss.array.sp_index.indices[na_mask] --> 132 i_coords, i_labels = _levels_to_axis( 133 ss, row_levels, valid_ilocs, sort_labels=sort_labels 134 ) 136 j_coords, j_labels = _levels_to_axis( 137 ss, column_levels, valid_ilocs, sort_labels=sort_labels 138 ) 140 return values, i_coords, j_coords, i_labels, j_labels
File ~/.pyenv/versions/grabble/lib/python3.10/site-packages/pandas/core/arrays/sparse/scipy_sparse.py:75, in _levels_to_axis(ss, levels, valid_ilocs, sort_labels) 72 ax_labels = ss.index.levels[levels[0]] 74 else: ---> 75 levels_values = lib.fast_zip( 76 [ss.index.get_level_values(lvl).values for lvl in levels] 77 ) 78 codes, ax_labels = factorize(levels_values, sort=sort_labels) 79 ax_coords = codes[valid_ilocs]
File ~/.pyenv/versions/grabble/lib/python3.10/site-packages/pandas/_libs/lib.pyx:475, in pandas._libs.lib.fast_zip()
SystemError: numpy/core/src/multiarray/iterators.c:185: bad argument to internal function
Expected Behavior
I expect to get back an incidence matrix with the correct number of entries (e.g. 10), with shape based on the categorical dtype... shape=(|edges.categories|,|nodes.categories|)
, or optionally the shape could be from only the observed edges(shape=(|edges|,|nodes|)
,
Installed Versions
INSTALLED VERSIONS
commit : 8dab54d
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-135-generic
Version : #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.3.1
Cython : None
pytest : 5.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.2
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : 2022.11.0
xlrd : None
xlwt : None
zstandard : None
tzdata : None