Efficiency of SparseArray.getitem(SparseArray[bool]) · Issue #23122 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@TomAugspurger

Description

@TomAugspurger

This currently densifies:

# TODO: I think we can avoid densifying when masking a
# boolean SparseArray with another. Need to look at the
# key's fill_value for True / False, and then do an intersection
# on the indicies of the sp_values.
if isinstance(key, SparseArray):
if is_bool_dtype(key):
key = key.to_dense()
else:
key = np.asarray(key)

I haven't investigated it, but we should be able to do a boolean mask as an
intersection sp_values on self and key. If key is SparseDtype[bool, False]
(i.e. False is the fill_value) this should be a lot faster.