ENH: set_index for Series · Issue #21684 · pandas-dev/pandas (original) (raw)

I was surprised to find that DataFrame.set_index has no analogue for Series. This is especially surprising because I remember reading in several comments here that it is strongly preferred (i.a. by @jreback) that users do not set attributes like .name or .index directly -- but currently, the only options to change the index on a Series is either that, or reconstructing with pd.Series(s, index=desired_index).

This is relevant in many scenarios, but in case someone would like a more concrete example -- I'm currently working on having .duplicated be able to return an inverse for DataFrame / Series / Index, see #21645. This inverse needs to link two different indexes -- the one of the original object, and the index of the deduplicated one. To reconstruct from the unique values, one needs exactly such a .set_index operation, because .reindex in itself cannot read a set of indexes and assign them to a different set of indexes in one go (xref #21685).

s = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
isdup, inv = s.duplicated(keep='last', return_inverse=True)
isdup
# 0     True
# 1     True
# 2     True
# 3    False
# 4    False
# 5    False
# dtype: bool

inv
# 0    4
# 1    5
# 2    4
# 3    3
# 4    4
# 5    5
# dtype: int64

unique = s.loc[~isdup]
unique
# 3    c
# 4    a
# 5    b
# dtype: object

reconstruct = unique.reindex(inv)
reconstruct 
# 4    a
# 5    b
# 4    a
# 3    c
# 4    a
# 5    b
# dtype: object

This object obviously still has the wrong index to be equal to the original. For DataFrames, the reconstruction would work as unique.reindex(inv.values).set_index(inv.index), and consequently, this should be available for Series as well:

Desired:

reconstruct = unique.reindex(inv.values).set_index(inv.index)
reconstruct
# 0    a
# 1    b
# 2    a
# 3    c
# 4    a
# 5    b
# dtype: object