Weird MultiIndex bug · Issue #3714 · pandas-dev/pandas (original) (raw)

Bug: Assigning levels/labels to a multiindex (or really any fields in Index) should raise (if done externally)

This is (for me) one of the weirdest things I have found so far (0.10.1 still)
tl;dr
it seems like df.copy() creates a shallow copy of the MultiIndex w.r.t levels
Also, the setting of the index.levels does not seem to have effect on the index itself

Running the same code twice results in different results during the second time
although I do not alter the original object in any place.
also note the line "print data.index.levels[1]" which changes after changing data2

my original goal (concatenating) always contains the original date in all rows

Code

import pandas as pd
from datetime import date, timedelta
import datetime
ind = pd.MultiIndex.from_tuples([(pd.Timestamp('2013-04-30 00:00:00'),
   datetime.date(2013, 5, 30)), (pd.Timestamp('2013-05-01 00:00:00'),datetime.date(2013, 5, 30))])
data = pd.DataFrame({'orders_ga': {0: 10.0,
  1: 15.0},
 'revenues_ga': {0: 5.0,
 1: 7.0}})
data.index = ind


data2 = data.copy()
print 'original index'
print data.index
print 'copied index levels'
print data2.index.levels[1]
data2.index.levels[1] = pd.Index([date.today() + timedelta(1)])
print 'index levels of original dataframe after assinging new index to the copied df'
print data.index.levels[1]
print 'new dfs index levels after assigning new levels'
print data2.index.levels[1]
#print ##################
#print data.index
#data2['recorded_at'] = date.today() + timedelta(1)
print '##################################################  output of concat'
print pd.concat([data, data2])
print '##########################################################################'
print 'one more time'
data2 = data.copy()
print 'original index'
print data.index
print 'copied index levels'
print data2.index.levels[1]
data2.index.levels[1] = pd.Index([date.today() + timedelta(1)])
print 'index levels of original dataframe after assinging new index to the copied df'
print data.index.levels[1]
print 'new dfs index levels after assigning new levels'
print data2.index.levels[1]
#print ##################
#print data.index
#data2['recorded_at'] = date.today() + timedelta(1)
print '##################################################  output of concat'
print pd.concat([data, data2])

Output:

original index
MultiIndex
[(2013-04-30 00:00:00, 2013-05-30), (2013-05-01 00:00:00, 2013-05-30)]
copied index levels
Index([2013-05-30], dtype=object)
index levels of original dataframe after assinging new index to the copied df
Index([2013-05-31], dtype=object)
new dfs index levels after assigning new levels
Index([2013-05-31], dtype=object)
##################################################  output of concat
                       orders_ga  revenues_ga
2013-04-30 2013-05-30         10            5
2013-05-01 2013-05-30         15            7
2013-04-30 2013-05-30         10            5
2013-05-01 2013-05-30         15            7
##########################################################################
one more time
original index
MultiIndex
[(2013-04-30 00:00:00, 2013-05-30), (2013-05-01 00:00:00, 2013-05-30)]
copied index levels
Index([2013-05-31], dtype=object)
index levels of original dataframe after assinging new index to the copied df
Index([2013-05-31], dtype=object)
new dfs index levels after assigning new levels
Index([2013-05-31], dtype=object)
##################################################  output of concat
                       orders_ga  revenues_ga
2013-04-30 2013-05-30         10            5
2013-05-01 2013-05-30         15            7
2013-04-30 2013-05-30         10            5
2013-05-01 2013-05-30         15            7