BUG: DataFrame Interchange Protocol errors on Boolean columns · Issue #55332 · pandas-dev/pandas (original) (raw)

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pyarrow.interchange as pai import pandas as pd

data = pd.DataFrame( { "b": pd.Series([True, False], dtype="boolean"), } ) print(data.dtypes) pai.from_dataframe(data)

Issue Description

When converting a pandas DataFrame using the DataFrame Interchange Protocol, boolean columns result in an exception:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [8], line 10
      4 data = pd.DataFrame(
      5     {
      6         "b": pd.Series([True, False], dtype="boolean"),
      7     }
      8 )
      9 print(data.dtypes)
---> 10 pai.from_dataframe(data)

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pyarrow/interchange/from_dataframe.py:111, in from_dataframe(df, allow_copy)
    108 if not hasattr(df, "__dataframe__"):
    109     raise ValueError("`df` does not support __dataframe__")
--> 111 return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
    112                        allow_copy=allow_copy)

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pyarrow/interchange/from_dataframe.py:134, in _from_dataframe(df, allow_copy)
    132 batches = []
    133 for chunk in df.get_chunks():
--> 134     batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
    135     batches.append(batch)
    137 return pa.Table.from_batches(batches)

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pyarrow/interchange/from_dataframe.py:168, in protocol_df_chunk_to_pyarrow(df, allow_copy)
    166     raise ValueError(f"Column {name} is not unique")
    167 col = df.get_column_by_name(name)
--> 168 dtype = col.dtype[0]
    169 if dtype in (
    170     DtypeKind.INT,
    171     DtypeKind.UINT,
   (...)
    174     DtypeKind.DATETIME,
    175 ):
    176     columns[name] = column_to_array(col, allow_copy)

File properties.pyx:36, in pandas._libs.properties.CachedProperty.__get__()

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pandas/core/interchange/column.py:128, in PandasColumn.dtype(self)
    126     raise NotImplementedError("Non-string object dtypes are not supported yet")
    127 else:
--> 128     return self._dtype_from_pandasdtype(dtype)

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pandas/core/interchange/column.py:147, in PandasColumn._dtype_from_pandasdtype(self, dtype)
    145     byteorder = dtype.base.byteorder  # type: ignore[union-attr]
    146 else:
--> 147     byteorder = dtype.byteorder
    149 return kind, dtype.itemsize * 8, dtype_to_arrow_c_fmt(dtype), byteorder

AttributeError: 'BooleanDtype' object has no attribute 'byteorder'

Expected Behavior

I expect the example above to successfully convert the pandas DataFrame to an Arrow table using the DataFrame Interchange Protocol.

Originally reported in the Vega-Altair project as vega/altair#3205

If it's not straightforward to support boolean columns, I would ask that an explicit NotImplementedException be raised (instead of an AttributeError). Vega-Altair uses this as an indication to fall back to an alternative implementation.

Installed Versions

INSTALLED VERSIONS

commit : e86ed37
python : 3.10.12.final.0
python-bits : 64
OS : Darwin
OS-release : 23.0.0
Version : Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 2.1.1
numpy : 1.23.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader : None
bs4 : 4.11.1
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.19
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None