ENH/API: Change query/eval local variable API · Issue #5987 · pandas-dev/pandas (original) (raw)

Currently, with query and eval you can use local variables a la the @ symbol. It's a bit confusing since you're not allowed to have a local variable and a column name with the same name, but it will try to pull the local if possible.

Current API:

Fails with a NameError:

a = 1 df = DataFrame({'a': randn(10), 'b': randn(10)}) df.query('a > b')

But this works:

And so does this, which is confusing:

a = 1 df = DataFrame({'b': randn(10), 'c': randn(10)}) df.query('a < b < c')

As suggested by @y-p and @jreback, the following API is less confusing IMO.

From now on, all local variables will need an explicit reference and if there is a column name and a local with the same name then the column will be used. Thus you can always be sure that you're referring to a column, or it doesn't exist, in which case you'll get an error. And if you use @ then you can be sure that you're referring to local, and likewise get an error if it doesn't exist. As a bonus ( 🐺 in 🐑 's clothing), this allows you to use both a local and a column name with the same name.

Examples:

a = 1 df = DataFrame({'a': randn(10), 'b': randn(10)})

uses the column 'a'

df.query('a > b')

uses the local

df.query('@a > b')

fails because I didn't reference the local and there's no 'c' column

c = 1 df.query('a > c')

local and a column name

df.query('b < @a < a')