ENH: set_index for Series · Issue #21684 · pandas-dev/pandas (original) (raw)
I was surprised to find that DataFrame.set_index
has no analogue for Series
. This is especially surprising because I remember reading in several comments here that it is strongly preferred (i.a. by @jreback) that users do not set attributes like .name
or .index
directly -- but currently, the only options to change the index on a Series
is either that, or reconstructing with pd.Series(s, index=desired_index)
.
This is relevant in many scenarios, but in case someone would like a more concrete example -- I'm currently working on having .duplicated
be able to return an inverse for DataFrame / Series / Index
, see #21645. This inverse needs to link two different indexes -- the one of the original object, and the index of the deduplicated one. To reconstruct from the unique values, one needs exactly such a .set_index
operation, because .reindex
in itself cannot read a set of indexes and assign them to a different set of indexes in one go (xref #21685).
s = pd.Series(['a', 'b', 'a', 'c', 'a', 'b'])
isdup, inv = s.duplicated(keep='last', return_inverse=True)
isdup
# 0 True
# 1 True
# 2 True
# 3 False
# 4 False
# 5 False
# dtype: bool
inv
# 0 4
# 1 5
# 2 4
# 3 3
# 4 4
# 5 5
# dtype: int64
unique = s.loc[~isdup]
unique
# 3 c
# 4 a
# 5 b
# dtype: object
reconstruct = unique.reindex(inv)
reconstruct
# 4 a
# 5 b
# 4 a
# 3 c
# 4 a
# 5 b
# dtype: object
This object obviously still has the wrong index to be equal to the original. For DataFrames
, the reconstruction would work as unique.reindex(inv.values).set_index(inv.index)
, and consequently, this should be available for Series
as well:
Desired:
reconstruct = unique.reindex(inv.values).set_index(inv.index)
reconstruct
# 0 a
# 1 b
# 2 a
# 3 c
# 4 a
# 5 b
# dtype: object