Bug in set_index or drop · Issue #2101 · pandas-dev/pandas (original) (raw)
I'm not sure if the bug is in the drop or that it shouldn't have let me set this MultiIndex since it's non-unique. var1 is just a combination of var2 and var3. In any event the result given back by this is garbage. I wanted to drop all the rows where the count of var1 == 1. Since there's not to my knowledge a way to drop variables without setting them to an index, I tried to do this without thinking
df = pandas.DataFrame([["x-a", "x", "a", 1.5],["x-a", "x", "a", 1.2],
["z-c", "z", "c", 3.1], ["x-a", "x", "a", 4.1],
["x-b", "x", "b", 5.1],["x-b", "x", "b", 4.1],
["x-b", "x", "b", 2.2],
["y-a", "y", "a", 1.2],["z-b", "z", "b", 2.1]],
columns=["var1", "var2", "var3", "var4"])
grp_size = df.groupby("var1").size()
drop_idx = grp_size.ix[grp_size == 1]
df.set_index(["var1", "var2", "var3"]).drop(drop_idx.index, level=0).reset_index()