BUG: Can't interchange zero-sized dataframes with Arrow · Issue #53155 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example & Issue Description
We can interchange our own empty dataframes just fine.
import pandas as pd from pandas.api.interchange import from_dataframe as pandas_from_dataframe orig_df = pd.DataFrame({}) pandas_from_dataframe(orig_df) Empty DataFrame Columns: [] Index: []
But fail when roundtripping with pyarrow
.
from pyarrow.interchange import from_dataframe as pyarrow_from_dataframe arrow_df = pyarrow_from_dataframe(orig_df) pandas_from_dataframe(arrow_df) ValueError: No objects to concatenate
Full traceback
../../../anaconda3/envs/df/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py:54: in from_dataframe return _from_dataframe(df.dataframe(allow_copy=allow_copy)) ../../../anaconda3/envs/df/lib/python3.9/site-packages/pandas/core/interchange/from_dataframe.py:85: in _from_dataframe pandas_df = pd.concat(pandas_dfs, axis=0, ignore_index=True, copy=False) ../../../anaconda3/envs/df/lib/python3.9/site-packages/pandas/core/reshape/concat.py:377: in concat op = _Concatenator( ../../../anaconda3/envs/df/lib/python3.9/site-packages/pandas/core/reshape/concat.py:440: in init objs, keys = self._clean_keys_and_objs(objs, keys)
self = <pandas.core.reshape.concat._Concatenator object at 0x7ff532e247c0>, objs = [], keys = None
def _clean_keys_and_objs(
self,
objs: Iterable[Series | DataFrame] | Mapping[HashableT, Series | DataFrame],
keys,
) -> tuple[list[Series | DataFrame], Index | None]:
if isinstance(objs, abc.Mapping):
if keys is None:
keys = list(objs.keys())
objs = [objs[k] for k in keys]
else:
objs = list(objs)
if len(objs) == 0:
raise ValueError("No objects to concatenate")
E ValueError: No objects to concatenate
This also happens for zero-row dataframes
orig_df = pd.DataFrame({"a": []}) arrow_df = pyarrow_from_dataframe(orig_df) pandas_from_dataframe(arrow_df) ValueError: No objects to concatenate
See that a roundtrip for a non-empty df works fine.
orig_df = arrow_from_dataframe(orig_df) arrow_df = pyarrow_from_dataframe(orig_df) pandas_from_dataframe(arrow_df) a 0 0 1 1 2 2 3 3 4 4
We also get the same issue if we interchange a pyarrow.Table
.
import pyarrow as pa orig_df = pa.from_dict({}) pandas_from_dataframe(orig_df) ValueError: No objects to concatenate
Expected Behavior
orig_df = pd.DataFrame({}) arrow_df = pyarrow_from_dataframe(orig_df) pandas_from_dataframe(arrow_df) Empty DataFrame Columns: [] Index: []
Installed Versions
pd.show_versions()
/home/honno/.local/share/virtualenvs/pandas-empty-df-GBPvUBxi/lib/python3.8/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : 37ea63d540fd27274cad6585082c91b1283f963d python : 3.8.12.final.0 python-bits : 64 OS : Linux OS-release : 5.19.0-41-generic Version : #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8
pandas : 2.0.1 numpy : 1.24.3 pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.6.1 pip : 23.0.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.12.2 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 12.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
Also tried this with pandas nightly
$ pip install --pre --extra-index https://pypi.anaconda.org/scipy-wheels-nightly/simple pandas --ignore-installed --no-deps
pyarrow is from their nightly builds
$ pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --prefer-binary --pre pyarrow