DataFrame query method - numexpr safety check fails · Issue #22435 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Your code here

import pandas as pd df = pd.DataFrame({'a': ['1','2','3'], 'b': [4,5,6]}) df.query("a.astype('int') < 2")

raises TypeError: unhashable type: 'numpy.ndarray'

Problem description

Background
When using numexpr, Pandas has an internal function, _check_ne_builtin_clash, for detecting when a variable used in a method like query clashes with a numexpr built-in.

Here's an example of the function raising an error as intended..

df = pd.DataFrame({'abs': [1,2,3]})
df.query("abs > 2")
# Raises NumExprClobberingError: Variables ... overlap with builtins: ('abs')

Mostly, the names it protects again are math functions like sin, cos, sum, etc..

Why my original example fails

The trouble with my original code is that check_ne_builtin_clash is checking the name of both sides of the BinaryExpr AST node corresponding to "a.astype('int') < 2".
It does this by putting them into a frozenset.
However, the LHS ends up being a Constant node, with the name array([1,2,3]), which is an ndarray, so is not hashable.

Solution

It seems like the helper function _check_ne_builtin_clash should consider any name that is unhashable safe, since it can't conflict with the function names being searched for. If this seems like a reasonable behavior, let me know and I will submit a PR!

code for function:

def _check_ne_builtin_clash(expr):
"""Attempt to prevent foot-shooting in a helpful way.
Parameters
----------
terms : Term
Terms can contain
"""
names = expr.names
overlap = names & _ne_builtins
if overlap:
s = ', '.join(map(repr, overlap))
raise NumExprClobberingError('Variables in expression "{expr}" '
'overlap with builtins: ({s})'
.format(expr=expr, s=s))

code for var names it looks for:

https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L20-L26

Expected Output

> df.query("a.astype('int') < 2")
   a  b
0  1  4

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.2.1
pip: 9.0.1
setuptools: 40.0.0
Cython: 0.24
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.4.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 4.2.2
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.10
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None