Version 0.16.2 (June 12, 2015) — pandas 3.0.0rc0+33.g1fd184de2a documentation (original) (raw)

This is a minor bug-fix release from 0.16.1 and includes a large number of bug fixes along some new features (pipe() method), enhancements, and performance improvements.

We recommend that all users upgrade to this version.

Highlights include:

What’s new in v0.16.2

New features#

Pipe#

We’ve introduced a new method DataFrame.pipe(). As suggested by the name, pipeshould be used to pipe data through a chain of function calls. The goal is to avoid confusing nested function calls like

df is a DataFrame

f, g, and h are functions that take and return DataFrames

f(g(h(df), arg1=1), arg2=2, arg3=3) # noqa F821

The logic flows from inside out, and function names are separated from their keyword arguments. This can be rewritten as

( df.pipe(h) # noqa F821 .pipe(g, arg1=1) # noqa F821 .pipe(f, arg2=2, arg3=3) # noqa F821 )

Now both the code and the logic flow from top to bottom. Keyword arguments are next to their functions. Overall the code is much more readable.

In the example above, the functions f, g, and h each expected the DataFrame as the first positional argument. When the function you wish to apply takes its data anywhere other than the first argument, pass a tuple of (function, keyword) indicating where the DataFrame should flow. For example:

In [1]: import statsmodels.formula.api as sm

In [2]: bb = pd.read_csv("data/baseball.csv", index_col="id")

sm.ols takes (formula, data)

In [3]: ( ...: bb.query("h > 0") ...: .assign(ln_h=lambda df: np.log(df.h)) ...: .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)") ...: .fit() ...: .summary() ...: ) ...: Out[3]: <class 'statsmodels.iolib.summary.Summary'> """ OLS Regression Results

Dep. Variable: hr R-squared: 0.685 Model: OLS Adj. R-squared: 0.665 Method: Least Squares F-statistic: 34.28 Date: Tue, 22 Nov 2022 Prob (F-statistic): 3.48e-15 Time: 05:35:23 Log-Likelihood: -205.92 No. Observations: 68 AIC: 421.8 Df Residuals: 63 BIC: 432.9 Df Model: 4 Covariance Type: nonrobust

            coef    std err          t      P>|t|      [0.025      0.975]

Intercept -8484.7720 4664.146 -1.819 0.074 -1.78e+04 835.780 C(lg)[T.NL] -2.2736 1.325 -1.716 0.091 -4.922 0.375 ln_h -1.3542 0.875 -1.547 0.127 -3.103 0.395 year 4.2277 2.324 1.819 0.074 -0.417 8.872 g 0.1841 0.029 6.258 0.000 0.125 0.243

Omnibus: 10.875 Durbin-Watson: 1.999 Prob(Omnibus): 0.004 Jarque-Bera (JB): 17.298 Skew: 0.537 Prob(JB): 0.000175 Kurtosis: 5.225 Cond. No. 1.49e+07

Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.49e+07. This might indicate that there are strong multicollinearity or other numerical problems. """

The pipe method is inspired by unix pipes, which stream text through processes. More recently dplyr and magrittr have introduced the popular (%>%) pipe operator for R.

See the documentation for more. (GH 10129)

Other enhancements#

API changes#

Performance improvements#

Bug fixes#

Contributors#

A total of 34 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.