BUG: Dividing Int64 and float64 results in a mix of null-types · Issue #42630 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd import numpy as np

data = { "A": [15, np.nan, 5, 4], "B": [15, 5, np.nan, 4], } df = pd.DataFrame(data) df = df.astype({"A": "Int64", "B": "float64"}) df["C"] = df["A"] / df["B"] print(df) print(df.dtypes) print(df.dropna(subset=["C"]))

This example outputs:

      A     B     C
0    15  15.0   1.0
1  <NA>   5.0  <NA>
2     5   NaN   NaN
3     4   4.0   1.0

A      Int64
B    float64
C    Float64
dtype: object

    A     B    C
0  15  15.0  1.0
2   5   NaN  NaN
3   4   4.0  1.0

Problem description

Dividing an Int64 column by a float64 column results in a Float64 column that can contain both <NA> and NaN values. However, only <NA> values will be removed by the dropna function. I would expect only <NA> values in the new column and I would also expect the dropna function to remove all NA variants (both <NA> and NaN).

Expected Output

The row with index 2 should be dropped.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f00ed8f
python : 3.9.4.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.3.0
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 49.2.1
Cython : 0.29.23
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.23.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.06.0
fastparquet : 0.6.3
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.17
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None