ENH: Support pa.json_ in arrow extension type (original) (raw)

Pandas version checks

Reproducible Example

$ import pandas as pd $ import pyarrow as pa $ pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type


NotImplementedError Traceback (most recent call last) Cell In[2], line 4 1 import pandas as pd 2 import pyarrow as pa ----> 4 pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type

File ~/src/bigframes/venv/lib/python3.12/site-packages/pandas/core/dtypes/dtypes.py:2169, in ArrowDtype.type(self) 2167 elif isinstance(pa_type, pa.ExtensionType): 2168 return type(self)(pa_type.storage_type).type -> 2169 raise NotImplementedError(pa_type)

NotImplementedError: extension<arrow.json>

Issue Description

Apache Arrow v19.0 introduced the pa.json_ extension type (doc). Currently, pandas.ArrowDtype.type does not correctly handle this new type.

The ArrowDtype.type method is crucial for various pandas dtype APIs, including pd.api.types.pandas_dtype() and pd.api.types.is_timedelta64_dtype(). When used with pd.ArrowDtype(pa.json_(pa.string())), these APIs produce unexpected results.

The issue is that the pandas ArrowDtype.type method should return the underlying storage type of the arrow json type.

Expected Behavior

pa.json_ is a standard Arrow extension type. ArrowDtype.type should accurately return its storage type, mirroring the behavior of other Arrow extension types.
Specifically, pd.ArrowDtype(pa.json_(pa.string())).type should reflect the storage type, which is pa.string() as shown below.

Codes to show arrow storage type for pa.json_:

$ import pyarrow as pa $ pa.json_(pa.string()).storage_type DataType(string)

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.12.1
python-bits : 64
OS : Linux
OS-release : 6.10.11-1rodete2-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1rodete2 (2024-10-16)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.3
numpy : 2.2.2
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 23.2.1
Cython : None
sphinx : None
IPython : 8.32.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.2.0
html5lib : None
hypothesis : None
gcsfs : 2025.2.0
jinja2 : None
lxml.etree : None
matplotlib : 3.10.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : 0.27.0
psycopg2 : None
pymysql : None
pyarrow : 19.0.0
pyreadstat : None
pytest : 8.3.4
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.15.1
sqlalchemy : 2.0.38
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None