API/ENH: Add mutate like method to DataFrames · Issue #9229 · pandas-dev/pandas (original) (raw)

In my notebook comparing dplyr and pandas, I gained a new level of appreciation for the ability to chain strings of operations together. In my own code, the biggest impediment to this is adding additional columns that are calculations on existing columns. For example

R / dplyr

mutate(flights, gain = arr_delay - dep_delay, speed = distance / air_time * 60)

... calculation involving these

vs.

flights['gain'] = flights.arr_delay - flights.dep_delay flights['speed'] = flights.distance / flights.air_time * 60

... calculation involving these later

just doesn't flow as nicely, especially if this mutate is in the middle of a chain.

I'd propose a new method (perhaps stealing mutate) that's similar to dplyr's.
The function signature could be kwarg only, where the keywords are the new column names. e.g.

flights.mutate(gain=flights.arr_delay - flights.dep_delay

This would return a DataFrame with the new column gain in addition to the original columns.

Worked out example

import pandas as pd import seaborn as sns

iris = sns.load_dataset('iris')

(iris.query('sepal_length > 4.5') .mutate(ratio=iris.sepal_length / iris.sepal_width) # new part .groupby(pd.cut(iris.ratio)).mean() )

Thoughts?