PyTables enhancements for selection · Issue #1996 · pandas-dev/pandas (original) (raw)
now
changes to pandas.io.pytables to support more natural selection (from tables):
- rename column -> major, index -> minor ( to be more consistent with panel nomenclature)
- provide parsable string selection methodology - pretty easy to do - and can be backwards compatible
store.select('mypanel', where = [ 'major>=20120103', 'major<=20120401', dict(minor = ['A','B','C' ]))
rather than existing
store.select('mypanel', where = [
dict(field = 'column', op = '>=', value = datetime.datetime(2012,1,3)),
dict(field = 'column', op = '<=', value = datetime.datetime(2012,4,1)),
dict(field = 'index', value = ['A','B','C']) ])
future
not sure that pandas should get really fancy just yet with operations - (e.g. 'or' operations, and actual value selection)
where = [ ( 'major>20120901' & dict(minor = ['A','B','C']) | (minor = ['D']) ]
where = [ item['foo']>2.0 ]
but probably necessary once pandas support 'chunking' type operations on pytables
need to build a full-fledged selection parser to translate to the numexpr type operations (maybe with a patsy backend????)
BUT this may actually be useful to support generic operations in this way on in-memory panels/frames
not sure of use cases here though - I usually just read in 'about' what data I need and sub-select from there
unless you have hundreds of millions of rows I don't know if its necessary to optimize more (in which case it is!)