pandas.DataFrame.pivot_table — pandas 0.24.0rc1 documentation (original) (raw)

Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", ... "bar", "bar", "bar", "bar"], ... "B": ["one", "one", "one", "two", "two", ... "one", "one", "two", "two"], ... "C": ["small", "large", "large", "small", ... "small", "large", "small", "small", ... "large"], ... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7], ... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]}) df A B C D E 0 foo one small 1 2 1 foo one large 2 4 2 foo one large 2 5 3 foo two small 3 5 4 foo two small 3 6 5 bar one large 4 6 6 bar one small 5 8 7 bar two small 6 9 8 bar two large 7 9

This first example aggregates values by taking the sum.

table = pivot_table(df, values='D', index=['A', 'B'], ... columns=['C'], aggfunc=np.sum) table C large small A B bar one 4 5 two 7 6 foo one 4 1 two NaN 6

We can also fill missing values using the fill_value parameter.

table = pivot_table(df, values='D', index=['A', 'B'], ... columns=['C'], aggfunc=np.sum, fill_value=0) table C large small A B bar one 4 5 two 7 6 foo one 4 1 two 0 6

The next example aggregates by taking the mean across multiple columns.

table = pivot_table(df, values=['D', 'E'], index=['A', 'C'], ... aggfunc={'D': np.mean, ... 'E': np.mean}) table D E mean mean A C bar large 5.500000 7.500000 small 5.500000 8.500000 foo large 2.000000 4.500000 small 2.333333 4.333333

table = pivot_table(df, values=['D', 'E'], index=['A', 'C'], ... aggfunc={'D': np.mean, ... 'E': [min, max, np.mean]}) table D E mean max mean min A C bar large 5.500000 9 7.500000 6 small 5.500000 9 8.500000 8 foo large 2.000000 5 4.500000 4 small 2.333333 6 4.333333 2