ENH: fix eval scoping issues by cpcloud · Pull Request #6366 · pandas-dev/pandas (original) (raw)
Relevant user-facing changes:
In [9]: df = DataFrame(randn(5, 2), columns=list('ab'))
In [10]: a, b = 1, 2
In [11]: # column names take precedence
In [12]: df.query('a < b')
Out[12]:
a b
0 -0.805549 -0.090572
1 -1.782325 -1.594079
2 -0.984364 0.934457
3 -1.963798 1.122112
[4 rows x 2 columns]
In [13]: # we must use @ whenever we want a local variable
In [14]: df.query('@a < b')
Out[14]:
a b
3 -1.963798 1.122112
[1 rows x 2 columns]
In [15]: # we cannot use @ in eval calls
In [16]: pd.eval('@a + b')
File "<string>", line unknown
SyntaxError: The '@' prefix is not allowed in top-level eval calls, please refer to your variables by name without the '@' prefix
In [17]: pd.eval('@a + b', parser='python')
File "<string>", line unknown
SyntaxError: The '@' prefix is only supported by the pandas parser
- update query/eval docstrings/indexing/perfenhancing
- make sure docs build
- release notes
- pytables
- make the
repr
ofScope
objects work or revert to previous version - more tests for new local variable scoping API
- disallow (and provide a useful error message for) locals in expressions like
pd.eval('@a + b')
- Raise when your variables have the same name as the builtin math functions that numexpr supports, since you cannot override them in
numexpr.evaluate
, even when explicitly passing them. For example
import numexpr as ne sin = randn(10) d = {'sin': sin} result = ne.evaluate('sin > 1', local_dict=d, global_dict=d) result == array(True)
For reference, after this PR local variables are given lower precedence than column names. For example
a, b = 1, 2 df = DataFrame(randn(10, 2), columns=list('ab')) res = df.query('a > b')
will no longer raise an exception about overlapping variable names. If you want the local a
(as opposed to the column a
) you must do