ENH: select column/coordinates/multiple with start/stop/selection by wabu · Pull Request #6177 · pandas-dev/pandas (original) (raw)
Here's an example showing how select_as_multiple did handle start before the commit in contrast to select:
In [3]: data = pd.DataFrame({'a': np.random.rand(10), 'b': np.random.rand(10)})
In [4]: store = pd.HDFStore('test.h5')
In [5]: store.append_to_multiple({'a': ['a'], 'b': ['b']}, data, 'a')
In [6]: store.select('a', where='a>.1', start=5)
Out[6]:
a
5 0.386593
6 0.363150
7 0.247858
8 0.628002
9 0.785359
[5 rows x 1 columns]
In [7]: store.select_as_multiple(['a','b'], where='a>.1', start=5)
Out[7]:
Empty DataFrame
Columns: [a, b]
Index: []
[0 rows x 2 columns]
The result is empty, as the where is first applied and the start is applied on the filtered result.
select_as_coordinates misbehaves when where cause results in a filter expression:
In [10]: sel = np.arange(5,1000)
In [11]: store.select_as_coordinates('a', where='index = sel')
Out[11]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
Here, the filter expression is ignored, so everything is returned
After the fix, we have:
In [4]: store.select('a', where='a>.1', start=5)
Out[4]:
a
5 0.386593
6 0.363150
7 0.247858
8 0.628002
9 0.785359
[5 rows x 1 columns]
In [5]: store.select_as_multiple(['a','b'], where='a>.1', start=5)
Out[5]:
a b
5 0.386593 0.258102
6 0.363150 0.345453
7 0.247858 0.841031
8 0.628002 0.437058
9 0.785359 0.520087
[5 rows x 2 columns]
In [6]: sel = np.arange(5,1000)
In [7]: store.select_as_coordinates('a', where='index = sel')
Out[7]: Int64Index([5, 6, 7, 8, 9], dtype='int64')