get_dummies with NaN · Issue #4446 · pandas-dev/pandas (original) (raw)

get_dummies seems to get caught out by NaNs

In [11]: s1 = pd.Series(['a', 'a', np.nan, 'c', 'c', 'c'])

In [12]: s1
Out[12]: 
0      a
1      a
2    NaN
3      c
4      c
5      c
dtype: object

In [13]: pd.get_dummies(s1)
Out[13]: 
   a  c
0  1  0
1  1  0
2  0  1
3  0  1
4  0  1
5  0  1

A rogue c has been used as the NaN value, I think expected is:

In [14]: pd.get_dummies(s1[s1.notnull()])
Out[14]: 
   a  c
0  1  0
1  1  0
3  0  1
4  0  1
5  0  1