Version 0.11.0 (April 22, 2013) — pandas 3.0.0.dev0+2104.ge637b4290d documentation (original) (raw)

This is a major release from 0.10.1 and includes many new features and enhancements along with a large number of bug fixes. The methods of Selecting Data have had quite a number of additions, and Dtype support is now full-fledged. There are also a number of important API changes that long-time pandas users should pay close attention to.

There is a new section in the documentation, 10 Minutes to pandas, primarily geared to new users.

There is a new section in the documentation, Cookbook, a collection of useful recipes in pandas (and that we want contributions!).

There are several libraries that are now Recommended Dependencies

Selection choices#

Starting in 0.11.0, object selection has had a number of user-requested additions in order to support more explicit location based indexing. pandas now supports three types of multi-axis indexing.

Selection deprecations#

Starting in version 0.11.0, these methods may be deprecated in future versions.

See the section Selection by Position for substitutes.

Dtypes#

Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the dtype keyword, a passed ndarray, or a passed Series, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will NOT be combined. The following example will give you a taste.

In [1]: df1 = pd.DataFrame(np.random.randn(8, 1), columns=['A'], dtype='float64')

In [2]: df1 Out[2]: A 0 0.469112 1 -0.282863 2 -1.509059 3 -1.135632 4 1.212112 5 -0.173215 6 0.119209 7 -1.044236

In [3]: df1.dtypes Out[3]: A float64 dtype: object

In [4]: df2 = pd.DataFrame({'A': pd.Series(np.random.randn(8), dtype='float32'), ...: 'B': pd.Series(np.random.randn(8)), ...: 'C': pd.Series(range(8), dtype='uint8')}) ...:

In [5]: df2 Out[5]: A B C 0 -0.861849 -0.424972 0 1 -2.104569 0.567020 1 2 -0.494929 0.276232 2 3 1.071804 -1.087401 3 4 0.721555 -0.673690 4 5 -0.706771 0.113648 5 6 -1.039575 -1.478427 6 7 0.271860 0.524988 7

In [6]: df2.dtypes Out[6]: A float32 B float64 C uint8 dtype: object

here you get some upcasting

In [7]: df3 = df1.reindex_like(df2).fillna(value=0.0) + df2

In [8]: df3 Out[8]: A B C 0 -0.392737 -0.424972 0.0 1 -2.387433 0.567020 1.0 2 -2.003988 0.276232 2.0 3 -0.063829 -1.087401 3.0 4 1.933667 -0.673690 4.0 5 -0.879986 0.113648 5.0 6 -0.920366 -1.478427 6.0 7 -0.772376 0.524988 7.0

In [9]: df3.dtypes Out[9]: A float64 B float64 C float64 dtype: object

Dtype conversion#

This is lower-common-denominator upcasting, meaning you get the dtype which can accommodate all of the types

In [10]: df3.values.dtype Out[10]: dtype('float64')

Conversion

In [11]: df3.astype('float32').dtypes Out[11]: A float32 B float32 C float32 dtype: object

Mixed conversion

In [12]: df3['D'] = '1.'

In [13]: df3['E'] = '1'

In [14]: df3.convert_objects(convert_numeric=True).dtypes Out[14]: A float32 B float64 C float64 D float64 E int64 dtype: object

same, but specific dtype conversion

In [15]: df3['D'] = df3['D'].astype('float16')

In [16]: df3['E'] = df3['E'].astype('int32')

In [17]: df3.dtypes Out[17]: A float32 B float64 C float64 D float16 E int32 dtype: object

Forcing date coercion (and setting NaT when not datelike)

In [18]: import datetime

In [19]: s = pd.Series([datetime.datetime(2001, 1, 1, 0, 0), 'foo', 1.0, 1, ....: pd.Timestamp('20010104'), '20010105'], dtype='O') ....:

In [20]: s.convert_objects(convert_dates='coerce') Out[20]: 0 2001-01-01 1 NaT 2 NaT 3 NaT 4 2001-01-04 5 2001-01-05 dtype: datetime64[ns]

Dtype gotchas#

Platform gotchas

Starting in 0.11.0, construction of DataFrame/Series will use default dtypes of int64 and float64,regardless of platform. This is not an apparent change from earlier versions of pandas. If you specify dtypes, they WILL be respected, however (GH 2837)

The following will all result in int64 dtypes

In [21]: pd.DataFrame([1, 2], columns=['a']).dtypes Out[21]: a int64 dtype: object

In [22]: pd.DataFrame({'a': [1, 2]}).dtypes Out[22]: a int64 dtype: object

In [23]: pd.DataFrame({'a': 1}, index=range(2)).dtypes Out[23]: a int64 dtype: object

Keep in mind that DataFrame(np.array([1,2])) WILL result in int32 on 32-bit platforms!

Upcasting gotchas

Performing indexing operations on integer type data can easily upcast the data. The dtype of the input data will be preserved in cases where nans are not introduced.

In [24]: dfi = df3.astype('int32')

In [25]: dfi['D'] = dfi['D'].astype('int64')

In [26]: dfi Out[26]: A B C D E 0 0 0 0 1 1 1 -2 0 1 1 1 2 -2 0 2 1 1 3 0 -1 3 1 1 4 1 0 4 1 1 5 0 0 5 1 1 6 0 -1 6 1 1 7 0 0 7 1 1

In [27]: dfi.dtypes Out[27]: A int32 B int32 C int32 D int64 E int32 dtype: object

In [28]: casted = dfi[dfi > 0]

In [29]: casted Out[29]: A B C D E 0 NaN NaN NaN 1 1 1 NaN NaN 1.0 1 1 2 NaN NaN 2.0 1 1 3 NaN NaN 3.0 1 1 4 1.0 NaN 4.0 1 1 5 NaN NaN 5.0 1 1 6 NaN NaN 6.0 1 1 7 NaN NaN 7.0 1 1

In [30]: casted.dtypes Out[30]: A float64 B float64 C float64 D int64 E int32 dtype: object

While float dtypes are unchanged.

In [31]: df4 = df3.copy()

In [32]: df4['A'] = df4['A'].astype('float32')

In [33]: df4.dtypes Out[33]: A float32 B float64 C float64 D float16 E int32 dtype: object

In [34]: casted = df4[df4 > 0]

In [35]: casted Out[35]: A B C D E 0 NaN NaN NaN 1.0 1 1 NaN 0.567020 1.0 1.0 1 2 NaN 0.276232 2.0 1.0 1 3 NaN NaN 3.0 1.0 1 4 1.933792 NaN 4.0 1.0 1 5 NaN 0.113648 5.0 1.0 1 6 NaN NaN 6.0 1.0 1 7 NaN 0.524988 7.0 1.0 1

In [36]: casted.dtypes Out[36]: A float32 B float64 C float64 D float16 E int32 dtype: object

Datetimes conversion#

Datetime64[ns] columns in a DataFrame (or a Series) allow the use of np.nan to indicate a nan value, in addition to the traditional NaT, or not-a-time. This allows convenient nan setting in a generic way. Furthermore datetime64[ns] columns are created by default, when passed datetimelike objects (this change was introduced in 0.10.1) (GH 2809, GH 2810)

In [12]: df = pd.DataFrame(np.random.randn(6, 2), pd.date_range('20010102', periods=6), ....: columns=['A', ' B']) ....:

In [13]: df['timestamp'] = pd.Timestamp('20010103')

In [14]: df Out[14]: A B timestamp 2001-01-02 0.404705 0.577046 2001-01-03 2001-01-03 -1.715002 -1.039268 2001-01-03 2001-01-04 -0.370647 -1.157892 2001-01-03 2001-01-05 -1.344312 0.844885 2001-01-03 2001-01-06 1.075770 -0.109050 2001-01-03 2001-01-07 1.643563 -1.469388 2001-01-03

datetime64[ns] out of the box

In [15]: df.dtypes.value_counts() Out[15]: float64 2 datetime64[s] 1 Name: count, dtype: int64

use the traditional nan, which is mapped to NaT internally

In [16]: df.loc[df.index[2:4], ['A', 'timestamp']] = np.nan

In [17]: df Out[17]: A B timestamp 2001-01-02 0.404705 0.577046 2001-01-03 2001-01-03 -1.715002 -1.039268 2001-01-03 2001-01-04 NaN -1.157892 NaT 2001-01-05 NaN 0.844885 NaT 2001-01-06 1.075770 -0.109050 2001-01-03 2001-01-07 1.643563 -1.469388 2001-01-03

Astype conversion on datetime64[ns] to object, implicitly converts NaT to np.nan

In [18]: import datetime

In [19]: s = pd.Series([datetime.datetime(2001, 1, 2, 0, 0) for i in range(3)])

In [20]: s.dtype Out[20]: dtype('<M8[us]')

In [21]: s[1] = np.nan

In [22]: s Out[22]: 0 2001-01-02 1 NaT 2 2001-01-02 dtype: datetime64[us]

In [23]: s.dtype Out[23]: dtype('<M8[us]')

In [24]: s = s.astype('O')

In [25]: s Out[25]: 0 2001-01-02 00:00:00 1 NaT 2 2001-01-02 00:00:00 dtype: object

In [26]: s.dtype Out[26]: dtype('O')

API changes#

Enhancements#

2001-10-31 0.117967
2001-11-30 0.702184
2001-12-31 0.414034

2001-01-02 0.926089 -2.026458 0.501277 -0.204683
2001-01-03 -0.076524 1.081161 1.141361 0.479243
2001-01-04 0.641817 -0.185352 1.824568 0.809152
2001-01-05 0.575237 0.669934 1.398014 -0.399338

p.reindex(items=['ItemA'], minor=['B']).squeeze()
2001-01-02 -2.026458
2001-01-03 1.081161
2001-01-04 -0.185352
2001-01-05 0.669934
Freq: D, Name: B, dtype: float64

See the full release notes or issue tracker on GitHub for a complete list.

Contributors#

A total of 50 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.