BUG: concat behaves differently for bool/boolean dtypes than it does for int64/Int64 · Issue #42800 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
int1 = pd.Series([1, 2], dtype='int64') int2 = pd.Series([pd.NA, 4], dtype='Int64')
pd.concat([int1, int2])
0 1
1 2
0 <NA>
1 4
dtype: Int64
bool1 = pd.Series([True, False], dtype='bool') bool2 = pd.Series([pd.NA, True], dtype='boolean')
pd.concat([bool1, bool2])
0 True
1 False
0 <NA>
1 True
dtype: object
Problem description
When running concat
with two series, one of int64
dtype and one of Int64
dtype containing pd.NA
, the resulting series has a dtype of Int64
, which makes sense as this type is able to contain data from both. However, performing this same test on series with types of bool
and boolean
, results in a series with a dtype of object
. It seems for consistency with the other behavior the series that results from concat should be boolean
and not object
.
Expected Output
The series resulting from running concat with a bool
series and a boolean
series, should be a series with a boolean
dtype instead of object
.
0 True
1 False
0 <NA>
1 True
dtype: boolean
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Tue Jan 12 22:04:47 PST 2021; root:xnu-4903.278.56~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.4
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 40.8.0
Cython : None
pytest : 5.2.0
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.5
fastparquet : 0.5.0
gcsfs : None
matplotlib : 3.2.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.52.0