PyTables enhancements for selection · Issue #1996 · pandas-dev/pandas (original) (raw)

now

changes to pandas.io.pytables to support more natural selection (from tables):

rename column -> major, index -> minor ( to be more consistent with panel nomenclature)
provide parsable string selection methodology - pretty easy to do - and can be backwards compatible

store.select('mypanel', where = [ 'major>=20120103', 'major<=20120401', dict(minor = ['A','B','C' ]))

rather than existing

store.select('mypanel', where = [ 
dict(field = 'column', op = '>=', value = datetime.datetime(2012,1,3)), 
dict(field = 'column', op = '<=', value = datetime.datetime(2012,4,1)), 
dict(field = 'index', value = ['A','B','C'])  ])

future

not sure that pandas should get really fancy just yet with operations - (e.g. 'or' operations, and actual value selection)

where = [ ( 'major>20120901' & dict(minor = ['A','B','C']) | (minor = ['D']) ]
where = [ item['foo']>2.0 ]

but probably necessary once pandas support 'chunking' type operations on pytables

need to build a full-fledged selection parser to translate to the numexpr type operations (maybe with a patsy backend????)
BUT this may actually be useful to support generic operations in this way on in-memory panels/frames

not sure of use cases here though - I usually just read in 'about' what data I need and sub-select from there
unless you have hundreds of millions of rows I don't know if its necessary to optimize more (in which case it is!)