PERF: optimize conversion from boolean Arrow array to masked BooleanArray by jorisvandenbossche · Pull Request #41051 · pandas-dev/pandas (original) (raw)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do pass an incompatible type we now get a less helpful message

Generally we should never do that ourselves, though.
It's possible to get that in a conversion of arrow -> pandas if you have outdated metadata:

df = pd.DataFrame({'a': pd.array([True, False])}) table = pa.table(df) new_table = table.cast(pa.schema([('a', pa.int8())], metadata=table.schema.metadata)) new_table.to_pandas() ... ~/scipy/repos/pandas-build-arrow/pandas/core/arrays/boolean.py in from_arrow(self, array) 135 mask = np.zeros(len(arr), dtype=bool) 136 --> 137 bool_arr = BooleanArray(data, mask) 138 results.append(bool_arr) 139

~/scipy/repos/pandas-build-arrow/pandas/core/arrays/boolean.py in init(self, values, mask, copy) 289 def init(self, values: np.ndarray, mask: np.ndarray, copy: bool = False): 290 if not (isinstance(values, np.ndarray) and values.dtype == np.bool_): --> 291 raise TypeError( 292 "values should be boolean numpy array. Use " 293 "the 'pd.array' function instead"

TypeError: values should be boolean numpy array. Use the 'pd.array' function instead

new_table = table.cast(pa.schema([('a', pa.string())], metadata=table.schema.metadata)) new_table.to_pandas() ... ~/scipy/repos/pandas-build-arrow/pandas/core/arrays/boolean.py in from_arrow(self, array) 124 for arr in chunks: 125 buflist = arr.buffers() --> 126 data = pyarrow.BooleanArray.from_buffers( 127 arr.type, len(arr), [None, buflist[1]], offset=arr.offset 128 ).to_numpy(zero_copy_only=False)

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.from_buffers()

ValueError: Type's expected number of buffers (3) did not match the passed number (2).

And I think this is maybe something to fix on the pyarrow side (it should ignore the metadata on error).

Now, it's also easy to add a arr.type check to ensure it is boolean arrow type, so will do that.