BUG: Error on query function when the column name has # symbol · Issue #59285 · pandas-dev/pandas (original) (raw)

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame((1,2,3), columns=['a#']) df.query('a# > 2')

KeyError Traceback (most recent call last) File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\scope.py:231, in Scope.resolve(self, key, is_local) 230 if self.has_resolvers: --> 231 return self.resolvers[key] 233 # if we're here that means that we have no locals and we also have 234 # no resolvers

File d:\Applications\Python\Python311\Lib\collections_init_.py:1006, in ChainMap.getitem(self, key) 1005 pass -> 1006 return self.missing(key)

File d:\Applications\Python\Python311\Lib\collections_init_.py:998, in ChainMap.missing(self, key) 997 def missing(self, key): --> 998 raise KeyError(key)

KeyError: 'a'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\scope.py:242, in Scope.resolve(self, key, is_local) 238 try: 239 # last ditch effort we look in temporaries 240 # these are created when parsing indexing expressions ... 242 return self.temps[key] 243 except KeyError as err: --> 244 raise UndefinedVariableError(key, is_local) from err

UndefinedVariableError: name 'a' is not defined

Issue Description

The query function seems to treat symbol # as a comment, it did not work as expected.

I also try to execute

it still throws an exception

Traceback (most recent call last):

File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\parsing.py:192 in tokenize_string yield tokenize_backtick_quoted_string(

File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\parsing.py:167 in tokenize_backtick_quoted_string return BACKTICK_QUOTED_STRING, source[string_start:string_end]

UnboundLocalError: cannot access local variable 'string_end' where it is not associated with a value

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File d:\Applications\Python\Python311\Lib\site-packages\IPython\core\interactiveshell.py:3553 in run_code exec(code_obj, self.user_global_ns, self.user_ns)

Cell In[59], line 1 df.query('a# > 2')

File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\frame.py:4823 in query res = self.eval(expr, **kwargs)

File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\frame.py:4949 in eval ... raise SyntaxError(f"Failed to parse backticks in '{source}'.") from err

File SyntaxError: Failed to parse backticks in 'a# > 2'.

Expected Behavior

like df[df['a#'] > 2]

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.11.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Chinese (Simplified)_China.936

pandas : 2.2.2
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 24.0
Cython : 3.0.8
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.20.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None