ENH: add Series & DataFrame .agg/.aggregate by jreback · Pull Request #14668 · pandas-dev/pandas (original) (raw)
- to provide convienent function application that mimics the groupby(..).agg/.aggregate
interface .apply
is now a synonym for.agg/.aggregate
, and will accept dict/list-likes
for aggregations- Automatic handling of both reductive and transformation functions, e.g.
Series.agg(['min', 'sqrt'])
- interpretation of string names on series (with a fallback to numpy), e.g.
sqrt
,log
)
custom .describe. I included these issues because its is quite easy to now do custom .describe.
closes #14483
closes #7014
TODO:
- docs (new section in basics, with linking to groupby, rolling, resample aggregates sections)
- tests for
Series.agg({'foo' : ['min', 'max']})
- update doc-strings
- example from: sparse resampling not working with dictionary of columns? #15386
Series:
In [2]: s = Series(range(6))
In [3]: s
Out[3]:
0 0
1 1
2 2
3 3
4 4
5 5
dtype: int64
In [4]: s.agg(['min', 'max'])
Out[4]:
min 0
max 5
dtype: int64
In [5]: s.agg(['sqrt', 'min'])
ValueError: cannot combine transform and aggregation operations
In [6]: s.agg({'foo' : 'min'})
Out[6]:
foo 0
dtype: int64
In [7]: s.agg({'foo' : ['min','max']})
Out[7]:
foo
min 0
max 5
In [8]: s.agg({'foo' : ['min','max'], 'bar' : ['sum', 'mean']})
Out[8]:
foo bar
max 5.0 NaN
mean NaN 2.5
min 0.0 NaN
sum NaN 15.0
DataFrame
In [8]: df = pd.DataFrame({'A': range(5), 'B': 5})
In [9]: df
Out[9]:
A B
0 0 5
1 1 5
2 2 5
3 3 5
4 4 5
In [10]: df.agg(['min', 'max'])
Out[10]:
A B
min 0 5
max 4 5
In [11]: df.agg({'A': ['min', 'max'], 'B': ['sum', 'max']})
Out[11]:
A B
max 4.0 5.0
min 0.0 NaN
sum NaN 25.0
# df.agg is equivalent to a tranfsorm
In [15]: df.transform([np.sqrt, np.abs, lambda x: x**2])
Out[15]:
A B
sqrt absolute <lambda> sqrt absolute <lambda>
0 0.000000 0 0 2.236068 5 25
1 1.000000 1 1 2.236068 5 25
2 1.414214 2 4 2.236068 5 25
3 1.732051 3 9 2.236068 5 25
4 2.000000 4 16 2.236068 5 25
Not sure what I should do in cases like this. could skip, or maybe raise better message.
In [16]: df = pd.DataFrame({'A': range(5), 'B': 5, 'C':'foo'})
In [17]: df
Out[17]:
A B C
0 0 5 foo
1 1 5 foo
2 2 5 foo
3 3 5 foo
4 4 5 foo
In [18]: df.transform(['log', 'abs'])
AttributeError: 'str' object has no attribute 'log'