ENH: Improve error message when specifying dtype="float32[pyarrow]" while PyArrow is not installed · Issue #57928 · pandas-dev/pandas (original) (raw)

Feature Type

Problem Description

If PyArrow is not installed properly, running the following code snippet from the User Guide:
ser = pd.Series([-1.5, 0.2, None], dtype="float32[pyarrow]")
will result in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/pandas/core/series.py", line 493, in __init__
    dtype = self._validate_dtype(dtype)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/generic.py", line 515, in _validate_dtype
    dtype = pandas_dtype(dtype)
            ^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/dtypes/common.py", line 1624, in pandas_dtype
    result = registry.find(dtype)
             ^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/dtypes/base.py", line 576, in find
    return dtype_type.construct_from_string(dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../python3.12/site-packages/pandas/core/dtypes/dtypes.py", line 2251, in construct_from_string
    pa_dtype = pa.type_for_alias(base_type)
               ^^
NameError: name 'pa' is not defined

which is not very informative.

Feature Description

We can improve the error message by letting the user know that there is something wrong regarding the installation of PyArrow, especially when the user believes that he/she has installed PyArrow, but actually installed it in a wrong location or installed an outdated version.

Alternative Solutions

Catch the NameError and raise another ImportError from it describing what happened.

Additional Context

This conforms to the description in Installation Guide: If the optional dependency is not installed, pandas will raise an ImportError when the method requiring that dependency is called.