BUG: ValueError: cannot convert float NaN to integer when resetting MultiIndex with NaT values (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

In [18]: ix = pd.MultiIndex.from_tuples([(pd.NaT, 1), (pd.NaT, 2)], names=['a', 'b'])

In [19]: ix Out[19]: MultiIndex([('NaT', 1), ('NaT', 2)], names=['a', 'b'])

In [20]: d = pd.DataFrame({'x': [11, 12]}, index=ix)

In [21]: d Out[21]: x a b
NaT 1 11 2 12

In [22]: d.reset_index()

ValueError Traceback (most recent call last) in ----> 1 d.reset_index()

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill) 4851 name = tuple(name_lst) 4852 # to ndarray and maybe infer different dtype -> 4853 level_values = _maybe_casted_values(lev, lab) 4854 new_obj.insert(0, name, level_values) 4855

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in _maybe_casted_values(index, labels) 4784 dtype = index.dtype 4785 fill_value = na_value_for_dtype(dtype) -> 4786 values = construct_1d_arraylike_from_scalar( 4787 fill_value, len(mask), dtype 4788 )

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in construct_1d_arraylike_from_scalar(value, length, dtype) 1556 1557 subarr = np.empty(length, dtype=dtype) -> 1558 subarr.fill(value) 1559 1560 return subarr

ValueError: cannot convert float NaN to integer

Problem description

With the introduction and use of groupby(..., dropna=False) multiindex with NaT values are more likely to occur which exhibits a few issues that previously went undetected. This issue was discovered when finding a workaround for another dropna=False related issue (#36060 (comment))

Further investigation shows that this may be an issue with numpy not accepting pd.NaT. The following code reproduces the issue in construct_1d_arraylike_from_scalar:

In [33]: a = np.empty(2, dtype='datetime64[ns]')

In [34]: a.fill(pd.NaT)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-9fa9ff66da99> in <module>
----> 1 a.fill(pd.NaT)

ValueError: cannot convert float NaN to integer

which led me to propose this fix:

--- a/pandas/core/dtypes/cast.py +++ b/pandas/core/dtypes/cast.py @@ -1559,6 +1559,12 @@ def construct_1d_arraylike_from_scalar( dtype = np.dtype("object") if not isna(value): value = ensure_str(value)

the value is NaT check could possibly be extended to isna(value)...

Expected Output

No ValueError being raised

Output of pd.show_versions()

Details

INSTALLED VERSIONS

commit : 15539fa
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_AU.UTF-8

pandas : 1.2.0.dev0+453.g15539fa62.dirty
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.4.0
Cython : 0.29.21
pytest : 6.0.2
hypothesis : 5.35.3
sphinx : 3.2.1
blosc : 1.9.2
feather : None
xlsxwriter : 1.3.4
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.2
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : 0.5.1
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2