API/ENH: Add mutate like method to DataFrames · Issue #9229 · pandas-dev/pandas (original) (raw)
In my notebook comparing dplyr and pandas, I gained a new level of appreciation for the ability to chain strings of operations together. In my own code, the biggest impediment to this is adding additional columns that are calculations on existing columns. For example
R / dplyr
mutate(flights, gain = arr_delay - dep_delay, speed = distance / air_time * 60)
... calculation involving these
vs.
flights['gain'] = flights.arr_delay - flights.dep_delay flights['speed'] = flights.distance / flights.air_time * 60
... calculation involving these later
just doesn't flow as nicely, especially if this mutate
is in the middle of a chain.
I'd propose a new method (perhaps stealing mutate
) that's similar to dplyr's.
The function signature could be kwarg only, where the keywords are the new column names. e.g.
flights.mutate(gain=flights.arr_delay - flights.dep_delay
This would return a DataFrame with the new column gain
in addition to the original columns.
Worked out example
import pandas as pd import seaborn as sns
iris = sns.load_dataset('iris')
(iris.query('sepal_length > 4.5') .mutate(ratio=iris.sepal_length / iris.sepal_width) # new part .groupby(pd.cut(iris.ratio)).mean() )
Thoughts?