ENH: sparse_series_to_coo performance · Issue #42880 · pandas-dev/pandas (original) (raw)

Converting a sparse Series to a scipy.sparse.coo_matrix could be much faster. I think the get_indexer function defined in _to_ijv adds unnecessary complexity.

Describe the solution you'd like

It can be much faster by accessing the codes attribute of the multiindex, as follows:

i_coord, j_coord = ss.index.codes
i_labels, j_labels = ss.index.levels

for a two-level multiindex. It should be straightforward to extend to more levels I think.

API breaking implications

None

Describe alternatives you've considered

None

Additional context

To give an example, I started digging into this problem because I had a 2-level-MultiIndexed Series with 61M rows, that is to be converted to a 1M x 1500 sparse matrix. Making the conversion using to_coo() took 10min, making it as described above took half a second.