Version 0.13.1 (February 3, 2014) — pandas 2.2.3 documentation (original) (raw)

This is a minor release from 0.13.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.

Highlights include:

Warning

0.13.1 fixes a bug that was caused by a combination of having numpy < 1.8, and doing chained assignment on a string-like array. Please review the docs, chained indexing can have unexpected results and should generally be avoided.

This would previously segfault:

df = pd.DataFrame({"A": np.array(["foo", "bar", "bah", "foo", "bar"])}) df["A"].iloc[0] = np.nan

The recommended way to do this type of assignment is:

In [1]: df = pd.DataFrame({"A": np.array(["foo", "bar", "bah", "foo", "bar"])})

In [2]: df.loc[0, "A"] = np.nan

In [3]: df Out[3]: A 0 NaN 1 bar 2 bah 3 foo 4 bar

Output formatting enhancements#

set to not display the null counts

In [7]: pd.set_option("max_info_rows", 0)
In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):

Column Dtype


0 A float64
1 B float64
2 C datetime64[ns]
dtypes: datetime64ns, float64(2)
memory usage: 368.0 bytes

this is the default (same as in 0.13.0)

In [9]: pd.set_option("max_info_rows", max_info_rows)
In [10]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):

Column Non-Null Count Dtype


0 A 7 non-null float64
1 B 10 non-null float64
2 C 7 non-null datetime64[ns]
dtypes: datetime64ns, float64(2)
memory usage: 368.0 bytes

0 2001-01-01 2013-04-19 4491 days
1 2004-06-01 2013-04-19 3244 days
[2 rows x 3 columns]

API changes#

Prior version deprecations/changes#

There are no announced changes in 0.13 or prior that are taking effect as of 0.13.1

Deprecations#

There are no deprecations of prior behavior in 0.13.1

Enhancements#

Try to infer the format for the index column

df = pd.read_csv(
"foo.csv", index_col=0, parse_dates=True, infer_datetime_format=True
)

2000-01-03 -0.673690 0.577046 -1.344312 -1.469388
2000-01-04 0.113648 -1.715002 0.844885 0.357021
2000-01-05 -1.478427 -1.039268 1.075770 -0.674600
2000-01-06 0.524988 -0.370647 -0.109050 -1.776904
2000-01-07 0.404705 -1.157892 1.643563 -0.968914
[5 rows x 4 columns]
Specifying an apply that operates on a Series (to return a single element)
In [32]: panel.apply(lambda x: x.dtype, axis='items')
Out[32]:
A B C D
2000-01-03 float64 float64 float64 float64
2000-01-04 float64 float64 float64 float64
2000-01-05 float64 float64 float64 float64
2000-01-06 float64 float64 float64 float64
2000-01-07 float64 float64 float64 float64
[5 rows x 4 columns]
A similar reduction type operation
In [33]: panel.apply(lambda x: x.sum(), axis='major_axis')
Out[33]:
ItemA ItemB ItemC
A -1.108775 -1.090118 -2.984435
B -3.705764 0.409204 1.866240
C 2.110856 2.960500 -0.974967
D -4.532785 0.303202 -3.685193
[4 rows x 3 columns]
This is equivalent to
In [34]: panel.sum('major_axis')
Out[34]:
ItemA ItemB ItemC
A -1.108775 -1.090118 -2.984435
B -3.705764 0.409204 1.866240
C 2.110856 2.960500 -0.974967
D -4.532785 0.303202 -3.685193
[4 rows x 3 columns]
A transformation operation that returns a Panel, but is computing the z-score across the major_axis
In [35]: result = panel.apply(lambda x: (x - x.mean()) / x.std(),
....: axis='major_axis')
....:
In [36]: result
Out[36]:
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: ItemA to ItemC
Major_axis axis: 2000-01-03 00:00:00 to 2000-01-07 00:00:00
Minor_axis axis: A to D
In [37]: result['ItemA'] # noqa E999
Out[37]:
A B C D
2000-01-03 -0.535778 1.500802 -1.506416 -0.681456
2000-01-04 0.397628 -1.108752 0.360481 1.529895
2000-01-05 -1.489811 -0.339412 0.557374 0.280845
2000-01-06 0.885279 0.421830 -0.453013 -1.053785
2000-01-07 0.742682 -0.474468 1.041575 -0.075499
[5 rows x 4 columns]

2000-01-03 0.012922 -0.030874 -0.629546 -0.757034
2000-01-04 0.392053 -1.071665 0.163228 0.548188
2000-01-05 -1.093650 -0.640898 0.385734 -1.154310
2000-01-06 1.005446 -1.154593 -0.595615 -0.809185
2000-01-07 0.783051 -0.198053 0.919339 -1.052721
[5 rows x 4 columns]
This is equivalent to the following
In [42]: result = pd.Panel({ax: f(panel.loc[:, :, ax]) for ax in panel.minor_axis})
In [43]: result
Out[43]:
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 5 (major_axis) x 3 (minor_axis)
Items axis: A to D
Major_axis axis: 2000-01-03 00:00:00 to 2000-01-07 00:00:00
Minor_axis axis: ItemA to ItemC
In [44]: result.loc[:, :, 'ItemA']
Out[44]:
A B C D
2000-01-03 0.012922 -0.030874 -0.629546 -0.757034
2000-01-04 0.392053 -1.071665 0.163228 0.548188
2000-01-05 -1.093650 -0.640898 0.385734 -1.154310
2000-01-06 1.005446 -1.154593 -0.595615 -0.809185
2000-01-07 0.783051 -0.198053 0.919339 -1.052721
[5 rows x 4 columns]

Performance#

Performance improvements for 0.13.1

Experimental#

There are no experimental changes in 0.13.1

Bug fixes#

Contributors#

A total of 52 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.