BUG: pd.NA when it replaces a value in a column, changes its type to "object" · Issue #44199 · pandas-dev/pandas (original) (raw)

Reproducible Example

import pandas as pd

mock_data = pd.DataFrame({ 'date': ['0', '1', '2', '3'], 'value': [1, 2, 1, 1.5] }) assert pd.api.types.is_numeric_dtype(mock_data.value) # passes

mock_data_pd_na.loc[2, 'value'] = pd.NA assert pd.api.types.is_numeric_dtype(mock_data.value) # breaks

Issue Description

Changing one value in a column with an NA/NULL should not change column's data type. That seems reasonable. Also, it seems the functionality is already there. I am not entirely sure if this is a bug or a feature.

Essentially it is due to the default assignment of a column type, which is float64 not pd.Float64Dtype(). I am not sure if the migration is on the roadmap, but this bug could be an argument in its favor.

Expected Behavior

mock_data = pd.DataFrame({ 'date': ['0', '1', '2', '3'], 'value': [1, 2, 1, 1.5] }) mock_data.value = mock_data.value.astype(pd.Float64Dtype()) # this should happen by default mock_data.value.dtype # Float64Dtype()

mock_data.loc[2, 'value'] = pd.NA assert pd.api.types.is_numeric_dtype(mock_data.value) # still passes mock_data.value.dtype # Float64Dtype()

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.11.0-34-generic
Version          : #36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8
pandas           : 1.3.4
numpy            : 1.21.2
pytz             : 2021.3
dateutil         : 2.8.2
pip              : 21.3.1
setuptools       : 58.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.2
IPython          : 7.28.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 5.0.0
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : 1.4.25
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : None
xlwt             : None
numba            : None