set_index on categorical fails with empty partitions (original) (raw)
Reproducible example
import numpy as np import pandas as pd import dask.dataframe as dd
pdf = pd.DataFrame({ "cat": pd.Categorical(np.repeat(list("ABC"), 20), ordered=True), "value": np.random.rand(60) }) ddf = dd.from_pandas(pdf, npartitions=3)
Filter on category A, partitions 2 and 3 will be empty.
ddf = ddf.loc[ddf["cat"] == "A"]
ddf.set_index("cat")
ValueError: zero-size array to reduction operation maximum which has no identity
Sounds similar to #2820
For now, the cull_empty_parititions workaround from SO can be used.