BUG: Missing keys using aggregation dictionary that are unsortable raise TypeError instead of SpecificationError · Issue #39025 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import numpy as np import pandas as pd

pd.version '1.3.0.dev0+358.g68db2d26dd'

df = pd.DataFrame( ... np.random.randn(1000, 3), ... index=pd.date_range("1/1/2012", freq="S", periods=1000), ... columns=[1, "foo", None], ... ) r = df.resample("3T") r.agg({1: "mean", "foo": "sum"}) 1 foo 2012-01-01 00:00:00 -0.070340 9.142790 2012-01-01 00:03:00 -0.048538 31.041777 2012-01-01 00:06:00 -0.058046 -0.394660 2012-01-01 00:09:00 0.014268 -8.947485 2012-01-01 00:12:00 0.040080 -6.869409 2012-01-01 00:15:00 0.020587 0.225159

r.agg({2: "mean", "bar": "sum"}) Traceback (most recent call last): File "", line 1, in File "C:\Users\simon\pandas\pandas\core\resample.py", line 298, in aggregate result, how = aggregate(self, func, *args, **kwargs) File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 566, in aggregate return agg_dict_like(obj, arg, _axis), True File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 741, in agg_dict_like cols = sorted(set(keys) - set(selected_obj.columns.intersection(keys))) TypeError: '<' not supported between instances of 'str' and 'int'

df = pd.DataFrame( ... np.random.randn(1000, 3), ... index=pd.date_range("1/1/2012", freq="S", periods=1000), ... columns=["A", "B", "C"], ... )

r = df.resample("3T") r.agg({"r1": "mean", "r2": "sum"}) Traceback (most recent call last): File "", line 1, in File "C:\Users\simon\pandas\pandas\core\resample.py", line 298, in aggregate result, how = aggregate(self, func, *args, **kwargs) File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 566, in aggregate return agg_dict_like(obj, arg, _axis), True File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 742, in agg_dict_like raise SpecificationError(f"Column(s) {cols} do not exist") pandas.core.base.SpecificationError: Column(s) ['r1', 'r2'] do not exist

code sample based on test_agg_consistency in pandas/tests/resample/test_resample_api.py

Problem description

with mypy 0.790

pandas/core/aggregation.py:741: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]" [type-var]

Expected Output

pandas.core.base.SpecificationError: Column(s) [2, 'bar'] do not exist

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]