DataFrame with Int64 columns casts to float64 with .max()/.min() · Issue #32651 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd import numpy as np

int64_info = np.iinfo("int64") s = pd.Series([int64_info.max, None, int64_info.min], dtype=pd.Int64Dtype()) df = pd.DataFrame({"Int64": s})

df.max() Int64 9.223372e+18 dtype: float64

Problem description

pd.Int64 data is converted to np.float64 in certain reduction operations on pd.DataFrame. This causes data corruption, as pd.Int64 is intended to avoid this exact issue.

Expected Output

df.max() should probably return a pd.Series of dtype='object' wrapping a pd.Int64 value.

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit : 27ad779python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-29-generic Version : #31-Ubuntu SMP Fri Jan 17 17:27:26 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+779.g27ad77971
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.4.0.dev0+62.g8ac3a4c8
fastparquet : 0.3.2
gcsfs : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
numba : 0.48.0