DataFrame.unstack() fails when some index column values are NaN · Issue #4862 · pandas-dev/pandas (original) (raw)
{Python 2.6.6, pandas 0.12}
A DataFrame will fail to unstack() when one of the columns retained as an index has NaN values. The code below sets up a dataframe with NaN in some index entries, at which point calling unstack() will fail.
In the first failure, the exception message is that the index "has duplicate entries" which is patently false. In the second failure, where a given id only has one NaN, the error message becomes cannot convert float NaN to integer.
A final try, with NaN converted to a sentinel value of 42, shows proper behavior.
import pandas from numpy import nan
df = pandas.DataFrame( {'agent': { 17263: 'Hg', 17264: 'U', 17265: 'Pb', 17266: 'Sn', 17267: 'Ag', 17268: 'Hg'}, 'change': { 17263: nan, 17264: 0.0, 17265: 7.070e-06, 17266: 2.3614e-05, 17267: 0.0, 17268: -0.00015}, 'dosage': { 17263: nan, 17264: nan, 17265: nan, 17266: 0.0133, 17267: 0.0133, 17268: 0.0133}, 's_id': { 17263: 680585148, 17264: 680585148, 17265: 680585148, 17266: 680607017, 17267: 680607017, 17268: 680607017}} ) try: dupe = df.copy().set_index(['s_id','dosage','agent']) badDupe = dupe.unstack() except Exception as e: print( 'Error with all data was: %s'%(e,) ) try: getnan = df.ix[17264:].copy().set_index(['s_id','dosage','agent']) badNan = getnan.unstack() except Exception as e: print( 'Error dropping first entry was: %s'%(e,) ) df.dosage[:3]=42 willWork = df.copy().set_index(['s_id','dosage','agent']) u = willWork.unstack() print(u)
Overall output:
Error with all data was: Index contains duplicate entries, cannot reshape
Error dropping first entry was: cannot convert float NaN to integer
change
agent Ag Hg Pb Sn U
s_id dosage
680585148 42.0000 NaN NaN 0.000007 NaN 0
680607017 0.0133 0 -0.00015 NaN 0.000024 NaN