ENH: sparse_series_to_coo performance · Issue #42880 · pandas-dev/pandas (original) (raw)
Is your feature request related to a problem?
Converting a sparse Series to a scipy.sparse.coo_matrix
could be much faster. I think the get_indexer
function defined in _to_ijv adds unnecessary complexity.
Describe the solution you'd like
It can be much faster by accessing the codes
attribute of the multiindex, as follows:
i_coord, j_coord = ss.index.codes
i_labels, j_labels = ss.index.levels
for a two-level multiindex. It should be straightforward to extend to more levels I think.
API breaking implications
None
Describe alternatives you've considered
None
Additional context
To give an example, I started digging into this problem because I had a 2-level-MultiIndexed Series with 61M rows, that is to be converted to a 1M x 1500 sparse matrix. Making the conversion using to_coo()
took 10min, making it as described above took half a second.