BUG: DataFrame.unstack() with two levels containing NaN · Issue #9497 · pandas-dev/pandas (original) (raw)

With latest master code (including #9061 and #9292), DataFrame.unstack() produces incorrect results when level contains two levels one of which contains NaN:

        df = pd.DataFrame({'A': list('aaaabbbb'),'B':range(8), 'C':range(8)})
        df.iloc[3, 1] = np.NaN
        dfs = df.set_index(['A', 'B'])
        left = dfs.unstack([0, 1)
        data = [3, 0, 1, 2, 4, 5, 6, 7]
        idx = MultiIndex(levels=[['C'], ['a', 'b'], [0, 1, 2, 4, 5, 6, 7]],
                         labels=[[0, 0, 0, 0, 0, 0, 0, 0],
                                 [0, 0, 0, 0, 1, 1, 1, 1],
                                 [-1, 0, 1, 2, 3, 4, 5, 6]],
                         names=[None, 'A', 'B'])
        right = Series(data, index=idx, dtype=dfs.dtypes[0])
        print("dfs=")
        print(dfs)
        print("\nleft=")
        print(left)
        print("\nright=")
        print(right)
        assert_series_equal(left, right)
======================================================================
FAIL: test_unstack_nan_index (pandas.tests.test_frame.TestDataFrame)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\seth\github\pandas2\build\lib.win-amd64-3.4\pandas\tests\test_frame.py", line 12439, in test_unstack_nan_index
    assert_series_equal(left, right)
  File "c:\Users\seth\github\pandas2\build\lib.win-amd64-3.4\pandas\util\testing.py", line 674, in assert_series_equal
    assert_almost_equal(left.values, right.values, check_less_precise)
  File "das\src\testing.pyx", line 58, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2745)
  File "das\src\testing.pyx", line 93, in pandas._testing.assert_almost_equal (pandas\src\testing.c:1830)
  File "das\src\testing.pyx", line 135, in pandas._testing.assert_almost_equal (pandas\src\testing.c:2514)
nose.proxy.AssertionError: (very low values) expected 3.00000 but got 0.00000, with decimal 5
-------------------- >> begin captured stdout << ---------------------
dfs=
       C
A B     
a 0    0
  1    1
  2    2
  NaN  3
b 4    4
  5    5
  6    6
  7    7

left=
   A  B
C  a  1    0
      2    1
      4    2
      0    3
   b  6    4
      7    5
   a  0    6
      1    7
dtype: int32

right=
   A  B  
C  a  NaN    3
      0      0
      1      1
      2      2
   b  4      4
      5      5
      6      6
      7      7
dtype: int32

--------------------- >> end captured stdout << ----------------------