.ne fails with abiguous message if comparing a list of columns containing column name 'dtype' · Issue #22383 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
df = pd.DataFrame([[0,1,2,'aa'],[0,1,2,'aa'],[0,1,5,'bb'],[0,1,5,'bb'],[0,1,5,'bb'],['cc',4,4,4]], columns=['a','b','c','dtype'])
df.loc[:, ['a', 'dtype']].ne(df.loc[:, ['a', 'dtype']])
In [10]: df Out[10]: a b c dtype 0 0 1 2 aa 1 0 1 2 aa 2 0 1 5 bb 3 0 1 5 bb 4 0 1 5 bb 5 cc 4 4 4
Problem description
Instead of the expected output, I receive:
In [8]: df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])
ValueError Traceback (most recent call last) in () ----> 1 df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/ops.pyc in f(self, other, axis, level) 1588 self, other = self.align(other, 'outer', 1589 level=level, copy=False) -> 1590 return self._compare_frame(other, na_op, str_rep) 1591 1592 elif isinstance(other, ABCSeries):
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _compare_frame(self, other, func, str_rep) 4790 return {col: func(a[col], b[col]) for col in a.columns} 4791 -> 4792 new_data = expressions.evaluate(_compare, str_rep, self, other) 4793 return self._constructor(data=new_data, index=self.index, 4794 columns=self.columns, copy=False)
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs) 201 """ 202 --> 203 use_numexpr = use_numexpr and _bool_arith_check(op_str, a, b) 204 if use_numexpr: 205 return _evaluate(op, op_str, a, b, **eval_kwargs)
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in _bool_arith_check(op_str, a, b, not_allowed, unsupported) 173 unsupported = {'+': '|', '*': '&', '-': '^'} 174 --> 175 if _has_bool_dtype(a) and _has_bool_dtype(b): 176 if op_str in unsupported: 177 warnings.warn("evaluating in Python space because the {op!r} "
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/generic.pyc in nonzero(self) 1574 raise ValueError("The truth value of a {0} is ambiguous. " 1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." -> 1576 .format(self.class.name)) 1577 1578 bool = nonzero
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Expected Output
a c
0 False False
1 False False
2 False False
3 False False
4 False False
5 False False
Alternative: a descriptive Error Message, telling me I can't use 'dtype' as column name.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-29-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.23.4
pytest: 2.8.7
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
feather: None
matplotlib: 1.5.1
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.5.0
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: 0.7.2.None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None