DataFrame with Int64 columns casts to float64 with .max()/.min() · Issue #32651 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd import numpy as np
int64_info = np.iinfo("int64") s = pd.Series([int64_info.max, None, int64_info.min], dtype=pd.Int64Dtype()) df = pd.DataFrame({"Int64": s})
df.max() Int64 9.223372e+18 dtype: float64
Problem description
pd.Int64
data is converted to np.float64
in certain reduction operations on pd.DataFrame
. This causes data corruption, as pd.Int64
is intended to avoid this exact issue.
Expected Output
df.max()
should probably return a pd.Series
of dtype='object'
wrapping a pd.Int64
value.
Output of pd.show_versions()
``` INSTALLED VERSIONS ------------------ commit : 27ad779python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-29-generic Version : #31-Ubuntu SMP Fri Jan 17 17:27:26 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.1.0.dev0+779.g27ad77971
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.4.0.dev0+62.g8ac3a4c8
fastparquet : 0.3.2
gcsfs : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
numba : 0.48.0