BUG: ValueError: cannot convert float NaN to integer - on dataframe.reset_index() · Issue #35657 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
import pandas as pd df = pd.DataFrame( dict(c1=[10.], c2=['a'], c3=pd.to_datetime('2020-01-01')))
Triggering conditions: multiindex with date, empty dataframe
Multiindex without date works
df.set_index(['c1', 'c2']).head(0).reset_index()
Regular index with date also works
df.set_index(['c3']).head(0).reset_index()
Multiindex with date crashes...
df.set_index(['c2', 'c3']).head(0).reset_index()
>> ValueError: cannot convert float NaN to integer
This used to work on pandas 1.0.3, but breaks on pandas 1.1.0
Though the error doesn't trigger if the dataframe is empty before
calling set_index()
df.head(0).set_index(['c2', 'c3']).reset_index()
I originally observed the bug in a groupby call
df.head(0).groupby(['c2', 'c3'])[['c1']].sum().reset_index()
>> ValueError: cannot convert float NaN to integer
This used to work on pandas 1.0.3, but breaks on pandas 1.1.0
Problem description
On pandas 1.1.0, I'm getting a ValueError exception when calling dataframe.reset_index() under the following conditions:
- Input dataframe is empty
- Multiindex from multiple columns, at least one of which is a datetime
The exception message is ValueError: cannot convert float NaN to integer
.
Error trace:
Error
Traceback (most recent call last):
df_out.reset_index()
File "/Users/pec21/PycharmProjects/anp_voice_report/virtual/lib/python3.6/site-packages/pandas/core/frame.py", line 4848, in reset_index
level_values = _maybe_casted_values(lev, lab)
File "/Users/pec21/PycharmProjects/anp_voice_report/virtual/lib/python3.6/site-packages/pandas/core/frame.py", line 4782, in _maybe_casted_values
fill_value, len(mask), dtype
File "/Users/pec21/PycharmProjects/anp_voice_report/virtual/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 1554, in construct_1d_arraylike_from_scalar
subarr.fill(value)
ValueError: cannot convert float NaN to integer
This error didn't happen on pandas 1.0.3 and earlier. I haven't tested any intermediate releases, nor the master branch.
Expected Output
No exception is raised, returns an empty dataframe.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.6.6.final.0
python-bits : 64
OS : Darwin
OS-release : 18.6.0
Version : Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_GB.UTF-8
pandas : 1.1.0
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 49.1.0
Cython : None
pytest : 5.3.4
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : None
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : 1.3.11
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None