DataFrame query method - numexpr safety check fails · Issue #22435 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Your code here
import pandas as pd df = pd.DataFrame({'a': ['1','2','3'], 'b': [4,5,6]}) df.query("a.astype('int') < 2")
raises TypeError: unhashable type: 'numpy.ndarray'
Problem description
Background
When using numexpr, Pandas has an internal function, _check_ne_builtin_clash
, for detecting when a variable used in a method like query clashes with a numexpr built-in.
Here's an example of the function raising an error as intended..
df = pd.DataFrame({'abs': [1,2,3]})
df.query("abs > 2")
# Raises NumExprClobberingError: Variables ... overlap with builtins: ('abs')
Mostly, the names it protects again are math functions like sin
, cos
, sum
, etc..
Why my original example fails
The trouble with my original code is that check_ne_builtin_clash
is checking the name of both sides of the BinaryExpr AST node corresponding to "a.astype('int') < 2"
.
It does this by putting them into a frozenset.
However, the LHS ends up being a Constant node, with the name array([1,2,3])
, which is an ndarray, so is not hashable.
Solution
It seems like the helper function _check_ne_builtin_clash
should consider any name that is unhashable safe, since it can't conflict with the function names being searched for. If this seems like a reasonable behavior, let me know and I will submit a PR!
code for function:
def _check_ne_builtin_clash(expr): |
---|
"""Attempt to prevent foot-shooting in a helpful way. |
Parameters |
---------- |
terms : Term |
Terms can contain |
""" |
names = expr.names |
overlap = names & _ne_builtins |
if overlap: |
s = ', '.join(map(repr, overlap)) |
raise NumExprClobberingError('Variables in expression "{expr}" ' |
'overlap with builtins: ({s})' |
.format(expr=expr, s=s)) |
code for var names it looks for:
https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L20-L26
Expected Output
> df.query("a.astype('int') < 2")
a b
0 1 4
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.2.1
pip: 9.0.1
setuptools: 40.0.0
Cython: 0.24
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.4.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 4.2.2
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None