str.split on np.nan gives np.nan in one column but None in another column · Issue #18450 · pandas-dev/pandas (original) (raw)
import pandas as pd import numpy as np
s = pd.Series(['19HT|C2', np.nan, '20ZT|C1']) print(s)
0 19HT|C2
1 NaN
2 20ZT|C1
dtype: object
s_split = s.str.split('|', expand=True) print(s_split)
0 1
0 19HT C2
1 NaN None
2 20ZT C1
0 object
1 object
dtype: object
print(type(s_split.loc[1,0]))
print(type(s_split.loc[1,1]))
Problem description
When np.nan
gets split, it becomes np.nan
(of type float
) in the first column but None
(of type NoneType
) in the second column. I'd consider this unexpected behavior. How come splitting a value of one type results in two values of different types?
Expected Output
0 1
0 19HT C2
1 NaN NaN
2 20ZT C1
Either np.nan
or None
in both columns, but not a mix of both. I'd say np.nan
makes most sense, since that's the original value of the row.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-40-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.7.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None