Version 0.10.1 (January 22, 2013) — pandas 2.2.3 documentation (original) (raw)

This is a minor release from 0.10.0 and includes new features, enhancements, and bug fixes. In particular, there is substantial new HDFStore functionality contributed by Jeff Reback.

An undesired API breakage with functions taking the inplace option has been reverted and deprecation warnings added.

API changes#

New features#

HDFStore#

You may need to upgrade your existing data files. Please visit thecompatibility section in the main docs.

You can designate (and index) certain columns that you want to be able to perform queries on a table, by passing a list to data_columns

In [1]: store = pd.HDFStore("store.h5")

In [2]: df = pd.DataFrame( ...: np.random.randn(8, 3), ...: index=pd.date_range("1/1/2000", periods=8), ...: columns=["A", "B", "C"], ...: ) ...:

In [3]: df["string"] = "foo"

In [4]: df.loc[df.index[4:6], "string"] = np.nan

In [5]: df.loc[df.index[7:9], "string"] = "bar"

In [6]: df["string2"] = "cool"

In [7]: df Out[7]: A B C string string2 2000-01-01 0.469112 -0.282863 -1.509059 foo cool 2000-01-02 -1.135632 1.212112 -0.173215 foo cool 2000-01-03 0.119209 -1.044236 -0.861849 foo cool 2000-01-04 -2.104569 -0.494929 1.071804 foo cool 2000-01-05 0.721555 -0.706771 -1.039575 NaN cool 2000-01-06 0.271860 -0.424972 0.567020 NaN cool 2000-01-07 0.276232 -1.087401 -0.673690 foo cool 2000-01-08 0.113648 -1.478427 0.524988 bar cool

on-disk operations

In [8]: store.append("df", df, data_columns=["B", "C", "string", "string2"])

In [9]: store.select("df", "B>0 and string=='foo'") Out[9]: A B C string string2 2000-01-02 -1.135632 1.212112 -0.173215 foo cool

this is in-memory version of this type of selection

In [10]: df[(df.B > 0) & (df.string == "foo")] Out[10]: A B C string string2 2000-01-02 -1.135632 1.212112 -0.173215 foo cool

Retrieving unique values in an indexable or data column.

note that this is deprecated as of 0.14.0

can be replicated by: store.select_column('df','index').unique()

store.unique("df", "index") store.unique("df", "string")

You can now store datetime64 in data columns

In [11]: df_mixed = df.copy()

In [12]: df_mixed["datetime64"] = pd.Timestamp("20010102")

In [13]: df_mixed.loc[df_mixed.index[3:4], ["A", "B"]] = np.nan

In [14]: store.append("df_mixed", df_mixed)

In [15]: df_mixed1 = store.select("df_mixed")

In [16]: df_mixed1 Out[16]: A B ... string2 datetime64 2000-01-01 0.469112 -0.282863 ... cool 1970-01-01 00:00:00.978393600 2000-01-02 -1.135632 1.212112 ... cool 1970-01-01 00:00:00.978393600 2000-01-03 0.119209 -1.044236 ... cool 1970-01-01 00:00:00.978393600 2000-01-04 NaN NaN ... cool 1970-01-01 00:00:00.978393600 2000-01-05 0.721555 -0.706771 ... cool 1970-01-01 00:00:00.978393600 2000-01-06 0.271860 -0.424972 ... cool 1970-01-01 00:00:00.978393600 2000-01-07 0.276232 -1.087401 ... cool 1970-01-01 00:00:00.978393600 2000-01-08 0.113648 -1.478427 ... cool 1970-01-01 00:00:00.978393600

[8 rows x 6 columns]

In [17]: df_mixed1.dtypes.value_counts() Out[17]: float64 3 object 2 datetime64[ns] 1 Name: count, dtype: int64

You can pass columns keyword to select to filter a list of the return columns, this is equivalent to passing aTerm('columns',list_of_columns_to_filter)

In [18]: store.select("df", columns=["A", "B"]) Out[18]: A B 2000-01-01 0.469112 -0.282863 2000-01-02 -1.135632 1.212112 2000-01-03 0.119209 -1.044236 2000-01-04 -2.104569 -0.494929 2000-01-05 0.721555 -0.706771 2000-01-06 0.271860 -0.424972 2000-01-07 0.276232 -1.087401 2000-01-08 0.113648 -1.478427

HDFStore now serializes MultiIndex dataframes when appending tables.

In [19]: index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], ....: ['one', 'two', 'three']], ....: labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], ....: [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], ....: names=['foo', 'bar']) ....:

In [20]: df = pd.DataFrame(np.random.randn(10, 3), index=index, ....: columns=['A', 'B', 'C']) ....:

In [21]: df Out[21]: A B C foo bar foo one -0.116619 0.295575 -1.047704 two 1.640556 1.905836 2.772115 three 0.088787 -1.144197 -0.633372 bar one 0.925372 -0.006438 -0.820408 two -0.600874 -1.039266 0.824758 baz two -0.824095 -0.337730 -0.927764 three -0.840123 0.248505 -0.109250 qux one 0.431977 -0.460710 0.336505 two -3.207595 -1.535854 0.409769 three -0.673145 -0.741113 -0.110891

In [22]: store.append('mi', df)

In [23]: store.select('mi') Out[23]: A B C foo bar foo one -0.116619 0.295575 -1.047704 two 1.640556 1.905836 2.772115 three 0.088787 -1.144197 -0.633372 bar one 0.925372 -0.006438 -0.820408 two -0.600874 -1.039266 0.824758 baz two -0.824095 -0.337730 -0.927764 three -0.840123 0.248505 -0.109250 qux one 0.431977 -0.460710 0.336505 two -3.207595 -1.535854 0.409769 three -0.673145 -0.741113 -0.110891

the levels are automatically included as data columns

In [24]: store.select('mi', "foo='bar'") Out[24]: A B C foo bar bar one 0.925372 -0.006438 -0.820408 two -0.600874 -1.039266 0.824758

Multi-table creation via append_to_multiple and selection viaselect_as_multiple can create/select from multiple tables and return a combined result, by using where on a selector table.

In [19]: df_mt = pd.DataFrame( ....: np.random.randn(8, 6), ....: index=pd.date_range("1/1/2000", periods=8), ....: columns=["A", "B", "C", "D", "E", "F"], ....: ) ....:

In [20]: df_mt["foo"] = "bar"

you can also create the tables individually

In [21]: store.append_to_multiple( ....: {"df1_mt": ["A", "B"], "df2_mt": None}, df_mt, selector="df1_mt" ....: ) ....:

In [22]: store Out[22]: <class 'pandas.io.pytables.HDFStore'> File path: store.h5

individual tables were created

In [23]: store.select("df1_mt") Out[23]: A B 2000-01-01 0.404705 0.577046 2000-01-02 -1.344312 0.844885 2000-01-03 0.357021 -0.674600 2000-01-04 0.276662 -0.472035 2000-01-05 0.895717 0.805244 2000-01-06 -1.170299 -0.226169 2000-01-07 -0.076467 -1.187678 2000-01-08 1.024180 0.569605

In [24]: store.select("df2_mt") Out[24]: C D E F foo 2000-01-01 -1.715002 -1.039268 -0.370647 -1.157892 bar 2000-01-02 1.075770 -0.109050 1.643563 -1.469388 bar 2000-01-03 -1.776904 -0.968914 -1.294524 0.413738 bar 2000-01-04 -0.013960 -0.362543 -0.006154 -0.923061 bar 2000-01-05 -1.206412 2.565646 1.431256 1.340309 bar 2000-01-06 0.410835 0.813850 0.132003 -0.827317 bar 2000-01-07 1.130127 -1.436737 -1.413681 1.607920 bar 2000-01-08 0.875906 -2.211372 0.974466 -2.006747 bar

as a multiple

In [25]: store.select_as_multiple( ....: ["df1_mt", "df2_mt"], where=["A>0", "B>0"], selector="df1_mt" ....: ) ....: Out[25]: A B C D E F foo 2000-01-01 0.404705 0.577046 -1.715002 -1.039268 -0.370647 -1.157892 bar 2000-01-05 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 bar 2000-01-08 1.024180 0.569605 0.875906 -2.211372 0.974466 -2.006747 bar

Enhancements

Bug Fixes

See the full release notes or issue tracker on GitHub for a complete list.

Contributors#

A total of 17 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.