Python Pandas pivot() – Be on the Right Side of Change (original) (raw)

Syntax

pandas.pivot(data, index=None, columns=None, values=None)

Return Value: The return value for the pivot() function is a reshaped DataFrame organized by index/column values.


Background

Direct quote from the Pandas Documentation website:

This function does not support data aggregation. If there are multiple values, they will result in a Multi-Index in the columns.

This article delves into each parameter for this function separately.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.


$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 'EID': [100, 101, 102, 103, 104], 'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'], 'Age': [19, 23, 21, 27, 36], 'Salary': [87343, 94123, 96543, 65232, 102375]}

💡 Note: The data structure used in this article is a Dictionary of Lists made up of Keys (column names) and associated Values (list).


The “data” Parameter

The pivot() function data parameter is a DataFrame. This parameter can be one of the following data types or another data type that converts to a DataFrame:

If the DataFrame is empty, the following output will display:

df = pd.DataFrame() print(df)

Output

Empty DataFrame Columns: [] Index: []

If the DataFrame contains staff, the output will be similar to the table below.

df = pd.DataFrame(staff) print(df)

💡 Note: Formatting will vary depending on the IDE used to run the code.

Output

| | FName | EID | Job | Age | Salary | | | ------- | ----- | --- | -------------- | ------ | ------ | | 0 | Clare | 100 | Designer I | 19 | 87343 | | 1 | Micah | 101 | Data Scientist | 23 | 94123 | | 2 | Ben | 102 | Developer | 21 | 96543 | | 3 | Mac | 103 | Designer II | 27 | 65232 | | 4 | Emma | 104 | Manager | 36 | 102375 |


The “index” Parameter

The pivot() method’s index parameter is not required. If used, these column(s) create a new index. If empty, the default index numbering system will display. The parameter can be one of the following data types:

💡 Note: All data assigned to the index parameter must contain unique values. If not, a ValueError occurs.

In this example, the index parameter is the EID and Job columns. The column parameter is required. If empty, the entire DataFrame will display. If missing, a TypeError will occur.

df_index = df.pivot(index=['EID', 'Job'], columns=[]) print(df_index)

Output

For this code example, the index parameter contains a list of column names. They display on the left in the same order as entered in the list: not the original data structure order.

staff = {'FName': ['Clare', 'Micah', 'Ben', 'Mac', 'Emma'], 'EID': [100, 101, 102, 103, 104], 'Job': ['Designer I', 'Data Scientist', 'Developer', 'Designer II', 'Manager'], 'Age': [19, 23, 21, 27, 36], 'Salary': [87343, 94123, 96543, 65232, 102375]}

The remaining columns with the relevant data will display (FName, Age, Salary) on the right.

EID Job FName Age Salary
100 Designer I Clare 19 87343
101 Data Scientist Micah 23 94123
102 Developer Ben 21 96543
103 Designer II Mac 27 65232
104 Manager Emma 36 102375

The “columns” Parameter

The Pandas pivot() method’s columns parameter is required. The parameter can be one of the following data types:

If empty (as shown above), the entire DataFrame displays. If missing, a TypeError occurs.

df_cols = df.pivot(index=['EID', 'Job', 'Salary'], columns=['FName']) print(df_cols)

Output

In this example, the index columns will display on the left side. The order is the same as that listed in the above parameter.

The list passed to the columns parameter (FNames) shows the text as a sub-heading. The remaining sub-headings are the values from that column.

The Age column is the only column omitted from both parameters. This column becomes the main table heading.

The Age data from the original data structure will display as floats.


The “values” Parameter

This function’s value parameter is not required. The parameter can be one of the following data types:

This parameter uses column(s) for populating DataFrame values. If this parameter is empty, all remaining columns will display.

In this example, the Salary data from the original data structure will display as floats.

df_values = df.pivot(index=['EID', 'Job', 'Age'], columns=['FName'], values=['Salary']) print(df_values)

The DataFrame