BUG/COMPAT: assert_* functions failing with nested arrays and latest numpy · Issue #50360 · pandas-dev/pandas (original) (raw)

When using the latest nightly numpy (1.25.0.dev0+..), we are getting some errors in the pyarrow test suite, that come from pandas functions (assert_series_equal, -> array_equivalent). Those errors don't happen with numpy 1.24.

A few examples that we encountered in the pyarrow tests seem all to boil down to the situation where you have multiple levels of nesting (an array of arrays of arrays), but where the one is a numpy array of nested numpy arrays, and the other is a numpy array of nested lists.

import pandas._testing as tm

one level of nesting -> ok

s1 = pd.Series([np.array([1, 2, 3]), np.array([4, 5])], dtype=object) s2 = pd.Series([[1, 2, 3], [4, 5]], dtype=object)

tm.assert_series_equal(s1, s2)

>1 level of nesting -> fails

s1 = pd.Series([np.array([[1, 2, 3], [4, 5]], dtype=object), np.array([[6], [7, 8]], dtype=object)], dtype=object) s2 = pd.Series([[[1, 2, 3], [4, 5]], [[6], [7, 8]]], dtype=object)

tm.assert_series_equal(s1, s2) ... File ~/scipy/repos/pandas-build-arrow/pandas/core/dtypes/missing.py:581, in _array_equivalent_object(left, right, strict_nan) 579 else: 580 try: --> 581 if np.any(np.asarray(left_value != right_value)): 582 return False 583 except TypeError as err:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Another example, that is essentially the same situation, but that gives a different error message (from numpy):

construct inner array with explicitly setting up the shape (otherwise numpy will infer it as 2D)

arr = np.empty(2, dtype=object) arr[:] = [np.array([None, 'b'], dtype=object), np.array(['c', 'd'], dtype=object)] s1 = pd.Series(np.array([arr, None], dtype=object)) s2 = pd.Series(np.array([list([[None, 'b'], ['c', 'd']]), None], dtype=object))

In [32]: tm.assert_series_equal(s1, s2) ... File ~/scipy/repos/pandas-build-arrow/pandas/core/dtypes/missing.py:581, in _array_equivalent_object(left, right, strict_nan) 579 else: 580 try: --> 581 if np.any(np.asarray(left_value != right_value)): 582 return False 583 except TypeError as err:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Another case with nesting of dictionaries (with the same error traceback):

s1 = pd.Series([{'f1': 1, 'f2': np.array(['a', 'b'], dtype=object)}], dtype=object) s2 = pd.Series([{'f1': 1, 'f2': ['a', 'b']}], dtype=object)

tm.assert_series_equal(s1, s2) ... ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()