DataFrame.unstack() fails when some index column values are NaN · Issue #4862 · pandas-dev/pandas (original) (raw)

{Python 2.6.6, pandas 0.12}

A DataFrame will fail to unstack() when one of the columns retained as an index has NaN values. The code below sets up a dataframe with NaN in some index entries, at which point calling unstack() will fail.

In the first failure, the exception message is that the index "has duplicate entries" which is patently false. In the second failure, where a given id only has one NaN, the error message becomes cannot convert float NaN to integer.

A final try, with NaN converted to a sentinel value of 42, shows proper behavior.

import pandas from numpy import nan

df = pandas.DataFrame( {'agent': { 17263: 'Hg', 17264: 'U', 17265: 'Pb', 17266: 'Sn', 17267: 'Ag', 17268: 'Hg'}, 'change': { 17263: nan, 17264: 0.0, 17265: 7.070e-06, 17266: 2.3614e-05, 17267: 0.0, 17268: -0.00015}, 'dosage': { 17263: nan, 17264: nan, 17265: nan, 17266: 0.0133, 17267: 0.0133, 17268: 0.0133}, 's_id': { 17263: 680585148, 17264: 680585148, 17265: 680585148, 17266: 680607017, 17267: 680607017, 17268: 680607017}} ) try: dupe = df.copy().set_index(['s_id','dosage','agent']) badDupe = dupe.unstack() except Exception as e: print( 'Error with all data was: %s'%(e,) ) try: getnan = df.ix[17264:].copy().set_index(['s_id','dosage','agent']) badNan = getnan.unstack() except Exception as e: print( 'Error dropping first entry was: %s'%(e,) ) df.dosage[:3]=42 willWork = df.copy().set_index(['s_id','dosage','agent']) u = willWork.unstack() print(u)

Overall output:

Error with all data was: Index contains duplicate entries, cannot reshape
Error dropping first entry was: cannot convert float NaN to integer

                   change                                 
agent                  Ag       Hg        Pb        Sn   U
s_id      dosage                                          
680585148 42.0000     NaN      NaN  0.000007       NaN   0
680607017 0.0133        0 -0.00015       NaN  0.000024 NaN