ENH: Preserve .names in df.set_index(df.index) by qwhelan · Pull Request #6459 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation13 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

qwhelan

closes #6452.

This causes a slight change in behavior in @jseabold's second example. Previously, df.set_index(df.index) would convert a MultiIndex into an Index of tuples:

In [7]: from statsmodels.datasets import grunfeld

In [8]: data = sm.datasets.grunfeld.load_pandas().data

In [9]: data = data.set_index(['firm', 'year'])

In [10]: data.set_index(data.index).index.names
Out[10]: [None]

In [11]: data
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 220 entries, (General Motors, 1935.0) to (American Steel, 1954.0)
Data columns (total 3 columns):
invest     220  non-null values
value      220  non-null values
capital    220  non-null values
dtypes: float64(3)

In [13]: data.set_index(data.index)
Out[13]: 
<class 'pandas.core.frame.DataFrame'>
Index: 220 entries, (General Motors, 1935.0) to (American Steel, 1954.0)
Data columns (total 3 columns):
invest     220  non-null values
value      220  non-null values
capital    220  non-null values
dtypes: float64(3)

This change makes it so the index remains a MultiIndex.

@jreback

this is not the way to fix this
instead catch the case of a passed index and treat it like a series rather than the ndarray case

@qwhelan

Treating it like a Series doesn't fix the issue described above. What should df.set_index([df.index, df.index]) return when the index is a MultiIndex? Currently (and in this patch), this would create an Index of two pairs of tuples.

The more correct behavior, in my opinion, would be to return a 4-level MultiIndex. This would require treating the MultiIndex as a DataFrame here or modify from_arrays to detect this case (presumably undesirable).

@qwhelan

@jreback This newest commit is what I'm thinking. Let me know if there's a more elegant way to get the columns out of a MultiIndex.

@qwhelan

@jreback, thanks for the suggestions. Most recent commit has those changes.

@jreback

looks good. can you add a note to release.rst and v0.14.0.txt both in the API sections, reference this issue and provide a short explanation (prob just a 1-liner - fi you think more is warranted you can do this in v0.14.0.txt with an example - but only if not clear what the change does)

@qwhelan

@jreback Added to notes and rebased. Sorry for the delay - I've been sick the last few days.

jreback

df = pd.util.testing.makeDataFrame()
df.index.name = 'name'
assert df.set_index(df.index).index.names == ['name']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.assertEquals for these rather than a bare assert

@qwhelan

@jreback Added an example to whatsnew. Let me know if you'd like this branch squashed/rebased.

@jreback

looks good
pls squash down to 1-2 commits and good to merge

@qwhelan

Preserve .names in df.set_index(df.index)

Check that df.set_index(df.index) doesn't convert a MultiIndex to an Index

Handle general case of df.set_index([df.index,...])

Cleanup

Add to release notes

Add equality checks

Fix issue on 2.6

Add example to whatsnew

@qwhelan

Alright, squashed and rebased.

jreback added a commit that referenced this pull request

Mar 4, 2014

@jreback

ENH: Preserve .names in df.set_index(df.index)

@jreback

2 participants

@qwhelan @jreback