ENH: Identify numpy-like zero-dimensional arrays as non-iterable · Issue #35131 · pandas-dev/pandas (original) (raw)
Note: I wasn't sure whether to label this as an enhancement or bug. I've put enhancement as I don't think the original implementation was actually incorrect, rather just limited in scope. I'm happy to change to bug if that's more appropriate.
Is your feature request related to a problem?
I'm one of the developers of Pint-Pandas, which provides an extension type for storing numeric data with units (and, once mature, may solve #10349). The problem we're having is that we can't pass all the extension type tests. In particular, we fail many tests due to Panda's current implementation of is_list_like
.
As @hgrecco shows here, pandas determines whether something is_list_like
here (code copied below).
cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1: return ( isinstance(obj, abc.Iterable) # we do not count strings/unicode/bytes as list-like and not isinstance(obj, (str, bytes)) # exclude zero-dimensional numpy arrays, effectively scalars and not (util.is_array(obj) and obj.ndim == 0) # exclude sets if allow_sets is False and not (allow_sets is False and isinstance(obj, abc.Set)) )
At the moment, all Pint Quantity
instances have an __iter__
method. It is designed to raise a TypeError
if the quantity isn't actually iterable (code here).
The problem is that the __iter__
method is defined, regardless of whether the Quantity
instance is actually iterable or not. This is a problem because it means that, in all cases, isinstance(quantity_instance, abc.Iterable)
returns True
(even if quantity_instance
isn't actually iterable and would return a TypeError
as soon as one tried to iterate with it).
A numpy array suffers from the same issue which is where the not (util.is_array(obj) and obj.ndim == 0)
in Panda's is_list_like
is important, because it excludes scalars. The problem is that util.is_array(obj)
returns False
if obj
is anything other than np.ndarray
. As a result, we can't just add an ndim
attribute to Quantity
because the obj.ndim == 0
check will never be evaluated anyway.
Describe the solution you'd like
Update is_list_like
so that it correctly identifies implementations of scalars in numpy-like arrays (I think these are called 'duck numpy arrays'?). I've put a draft in #35127 and would be grateful for feedback on the proposed solution there. If there's a better solution, I'm also happy to implement that instead.
API breaking implications
is_list_like
would now return False
for scalars of numpy-like array implementations. If we used the implementation in #35127 then any object that has an ndim
attribute and has obj.ndim == 0
would no longer be identified as list-like.
Describe alternatives you've considered
We've discussed how we could get the behaviour we want by changing Pint, rather than changing Pandas, but it seems to be impossible if we want to maintain the numpy-like implementation without separate scalar and array classes (see discussion here).
Additional context
import numpy as np import pandas as pd import pint from collections import abc
numpy_scalar = np.array(2) numpy_scalar array(2)
numpy 0d-array (i.e. scalar) quantities appear to be iterable even though they're not
isinstance(numpy_scalar, abc.Iterable) True numpy_scalar[0] ... IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
however, numpy 0d-array quantities are correctly handled by Pandas
pd.core.dtypes.inference.is_list_like(numpy_scalar) False
compare this with Pint's numpy-like implementation
ureg = pint.UnitRegistry() quantity_scalar = 3 * ureg.meter quantity_scalar <Quantity(3, 'meter')>
similar to a numpy array, scalar quantities appear to be iterable even though they're not
isinstance(quantity_scalar, abc.Iterable) True quantity_scalar[0]
... TypeError: Neither Quantity object nor its magnitude (3)supports indexing
however, Pandas identifies Pint's implementation as being iterable which is what is causing
our problem
pd.core.dtypes.inference.is_list_like(quantity_scalar) True # we would like this to be False