BUG: Error on query function when the column name has # symbol · Issue #59285 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame((1,2,3), columns=['a#']) df.query('a# > 2')
KeyError Traceback (most recent call last) File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\scope.py:231, in Scope.resolve(self, key, is_local) 230 if self.has_resolvers: --> 231 return self.resolvers[key] 233 # if we're here that means that we have no locals and we also have 234 # no resolvers
File d:\Applications\Python\Python311\Lib\collections_init_.py:1006, in ChainMap.getitem(self, key) 1005 pass -> 1006 return self.missing(key)
File d:\Applications\Python\Python311\Lib\collections_init_.py:998, in ChainMap.missing(self, key) 997 def missing(self, key): --> 998 raise KeyError(key)
KeyError: 'a'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\scope.py:242, in Scope.resolve(self, key, is_local) 238 try: 239 # last ditch effort we look in temporaries 240 # these are created when parsing indexing expressions ... 242 return self.temps[key] 243 except KeyError as err: --> 244 raise UndefinedVariableError(key, is_local) from err
UndefinedVariableError: name 'a' is not defined
Issue Description
The query
function seems to treat symbol #
as a comment, it did not work as expected.
I also try to execute
it still throws an exception
Traceback (most recent call last):
File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\parsing.py:192 in tokenize_string yield tokenize_backtick_quoted_string(
File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\computation\parsing.py:167 in tokenize_backtick_quoted_string return BACKTICK_QUOTED_STRING, source[string_start:string_end]
UnboundLocalError: cannot access local variable 'string_end' where it is not associated with a value
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File d:\Applications\Python\Python311\Lib\site-packages\IPython\core\interactiveshell.py:3553 in run_code exec(code_obj, self.user_global_ns, self.user_ns)
Cell In[59], line 1
df.query('a#
> 2')
File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\frame.py:4823 in query res = self.eval(expr, **kwargs)
File d:\Applications\Python\Python311\Lib\site-packages\pandas\core\frame.py:4949 in eval ... raise SyntaxError(f"Failed to parse backticks in '{source}'.") from err
File
SyntaxError: Failed to parse backticks in 'a#
> 2'.
Expected Behavior
like df[df['a#'] > 2]
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.11.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Chinese (Simplified)_China.936
pandas : 2.2.2
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 24.0
Cython : 3.0.8
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.20.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None