ENH: fix eval scoping issues by cpcloud · Pull Request #6366 · pandas-dev/pandas (original) (raw)

closes #5987
closes #5087

Relevant user-facing changes:

In [9]: df = DataFrame(randn(5, 2), columns=list('ab'))

In [10]: a, b = 1, 2

In [11]: # column names take precedence

In [12]: df.query('a < b')
Out[12]:
          a         b
0 -0.805549 -0.090572
1 -1.782325 -1.594079
2 -0.984364  0.934457
3 -1.963798  1.122112

[4 rows x 2 columns]

In [13]: # we must use @ whenever we want a local variable

In [14]: df.query('@a < b')
Out[14]:
          a         b
3 -1.963798  1.122112

[1 rows x 2 columns]

In [15]: # we cannot use @ in eval calls

In [16]: pd.eval('@a + b')
  File "<string>", line unknown
SyntaxError: The '@' prefix is not allowed in top-level eval calls, please refer to your variables by name without the '@' prefix

In [17]: pd.eval('@a + b', parser='python')
  File "<string>", line unknown
SyntaxError: The '@' prefix is only supported by the pandas parser

update query/eval docstrings/indexing/perfenhancing
make sure docs build
release notes
pytables
make the repr of Scope objects work or revert to previous version
more tests for new local variable scoping API
disallow (and provide a useful error message for) locals in expressions like pd.eval('@a + b')
Raise when your variables have the same name as the builtin math functions that numexpr supports, since you cannot override them in numexpr.evaluate, even when explicitly passing them. For example

import numexpr as ne sin = randn(10) d = {'sin': sin} result = ne.evaluate('sin > 1', local_dict=d, global_dict=d) result == array(True)

For reference, after this PR local variables are given lower precedence than column names. For example

a, b = 1, 2 df = DataFrame(randn(10, 2), columns=list('ab')) res = df.query('a > b')

will no longer raise an exception about overlapping variable names. If you want the local a (as opposed to the column a) you must do