ENH: select column/coordinates/multiple with start/stop/selection by wabu · Pull Request #6177 · pandas-dev/pandas (original) (raw)

Here's an example showing how select_as_multiple did handle start before the commit in contrast to select:

In [3]: data = pd.DataFrame({'a': np.random.rand(10), 'b': np.random.rand(10)})
In [4]: store = pd.HDFStore('test.h5')
In [5]: store.append_to_multiple({'a': ['a'], 'b': ['b']}, data, 'a')
In [6]: store.select('a', where='a>.1', start=5)
Out[6]: 
          a
5  0.386593
6  0.363150
7  0.247858
8  0.628002
9  0.785359
[5 rows x 1 columns]

In [7]: store.select_as_multiple(['a','b'], where='a>.1', start=5)
Out[7]: 
Empty DataFrame
Columns: [a, b]
Index: []

[0 rows x 2 columns]

The result is empty, as the where is first applied and the start is applied on the filtered result.

select_as_coordinates misbehaves when where cause results in a filter expression:

In [10]: sel = np.arange(5,1000)
In [11]: store.select_as_coordinates('a', where='index = sel')
Out[11]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

Here, the filter expression is ignored, so everything is returned

After the fix, we have:

In [4]: store.select('a', where='a>.1', start=5)
Out[4]: 
          a
5  0.386593
6  0.363150
7  0.247858
8  0.628002
9  0.785359
[5 rows x 1 columns]
In [5]: store.select_as_multiple(['a','b'], where='a>.1', start=5)
Out[5]: 
          a         b
5  0.386593  0.258102
6  0.363150  0.345453
7  0.247858  0.841031
8  0.628002  0.437058
9  0.785359  0.520087
[5 rows x 2 columns]

In [6]: sel = np.arange(5,1000)
In [7]: store.select_as_coordinates('a', where='index = sel')
Out[7]: Int64Index([5, 6, 7, 8, 9], dtype='int64')