Version 0.7.0 (February 9, 2012) — pandas 3.0.0.dev0+2104.ge637b4290d documentation (original) (raw)

New features#

In [1]: df = pd.DataFrame(np.random.randn(10, 4)) In [2]: df.apply(lambda x: x.describe()) Out[2]: 0 1 2 3 count 10.000000 10.000000 10.000000 10.000000 mean 0.190912 -0.395125 -0.731920 -0.403130 std 0.730951 0.813266 1.112016 0.961912 min -0.861849 -2.104569 -1.776904 -1.469388 25% -0.411391 -0.698728 -1.501401 -1.076610 50% 0.380863 -0.228039 -1.191943 -1.004091 75% 0.658444 0.057974 -0.034326 0.461706 max 1.212112 0.577046 1.643563 1.071804

[8 rows x 4 columns]

API changes to integer indexing#

One of the potentially riskiest API changes in 0.7.0, but also one of the most important, was a complete review of how integer indexes are handled with regard to label-based indexing. Here is an example:

In [3]: s = pd.Series(np.random.randn(10), index=range(0, 20, 2)) In [4]: s Out[4]: 0 -1.294524 2 0.413738 4 0.276662 6 -0.472035 8 -0.013960 10 -0.362543 12 -0.006154 14 -0.923061 16 0.895717 18 0.805244 Length: 10, dtype: float64

In [5]: s[0] Out[5]: -1.2945235902555294

In [6]: s[2] Out[6]: 0.41373810535784006

In [7]: s[4] Out[7]: 0.2766617129497566

This is all exactly identical to the behavior before. However, if you ask for a key not contained in the Series, in versions 0.6.1 and prior, Series would_fall back_ on a location-based lookup. This now raises a KeyError:

This change also has the same impact on DataFrame:

In [3]: df = pd.DataFrame(np.random.randn(8, 4), index=range(0, 16, 2))

In [4]: df 0 1 2 3 0 0.88427 0.3363 -0.1787 0.03162 2 0.14451 -0.1415 0.2504 0.58374 4 -1.44779 -0.9186 -1.4996 0.27163 6 -0.26598 -2.4184 -0.2658 0.11503 8 -0.58776 0.3144 -0.8566 0.61941 10 0.10940 -0.7175 -1.0108 0.47990 12 -1.16919 -0.3087 -0.6049 -0.43544 14 -0.07337 0.3410 0.0424 -0.16037

In [5]: df.ix[3] KeyError: 3

In order to support purely integer-based indexing, the following methods have been added:

API tweaks regarding label-based slicing#

Label-based slicing using ix now requires that the index be sorted (monotonic) unless both the start and endpoint are contained in the index:

In [1]: s = pd.Series(np.random.randn(6), index=list('gmkaec'))

In [2]: s Out[2]: g -1.182230 m -0.276183 k -0.243550 a 1.628992 e 0.073308 c -0.539890 dtype: float64

Then this is OK:

In [3]: s.ix['k':'e'] Out[3]: k -0.243550 a 1.628992 e 0.073308 dtype: float64

But this is not:

In [12]: s.ix['b':'h'] KeyError 'b'

If the index had been sorted, the “range selection” would have been possible:

In [4]: s2 = s.sort_index()

In [5]: s2 Out[5]: a 1.628992 c -0.539890 e 0.073308 g -1.182230 k -0.243550 m -0.276183 dtype: float64

In [6]: s2.ix['b':'h'] Out[6]: c -0.539890 e 0.073308 g -1.182230 dtype: float64

Changes to Series [] operator#

As as notational convenience, you can pass a sequence of labels or a label slice to a Series when getting and setting values via [] (i.e. the__getitem__ and __setitem__ methods). The behavior will be the same as passing similar input to ix except in the case of integer indexing:

In [8]: s = pd.Series(np.random.randn(6), index=list('acegkm'))

In [9]: s Out[9]: a -1.206412 c 2.565646 e 1.431256 g 1.340309 k -1.170299 m -0.226169 Length: 6, dtype: float64

In [10]: s[['m', 'a', 'c', 'e']] Out[10]: m -0.226169 a -1.206412 c 2.565646 e 1.431256 Length: 4, dtype: float64

In [11]: s['b':'l'] Out[11]: c 2.565646 e 1.431256 g 1.340309 k -1.170299 Length: 4, dtype: float64

In [12]: s['c':'k'] Out[12]: c 2.565646 e 1.431256 g 1.340309 k -1.170299 Length: 4, dtype: float64

In the case of integer indexes, the behavior will be exactly as before (shadowing ndarray):

In [13]: s = pd.Series(np.random.randn(6), index=range(0, 12, 2))

In [14]: s[[4, 0, 2]] Out[14]: 4 0.132003 0 0.410835 2 0.813850 Length: 3, dtype: float64

In [15]: s[1:5] Out[15]: 2 0.813850 4 0.132003 6 -0.827317 8 -0.076467 Length: 4, dtype: float64

If you wish to do indexing with sequences and slicing on an integer index with label semantics, use ix.

Other API changes#

Performance improvements#

Contributors#

A total of 18 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.