Categorical equality check raises ValueError in DataFrame · Issue #12564 · pandas-dev/pandas (original) (raw)
Apparently there's an issue when comparing the equality of a scalar value against a categorical column as part of a DataFrame. In the example below, I'm checking against -np.inf
, but comparing to a string or integer gives the same results.
This raises ValueError: Wrong number of dimensions
.
Code Sample
from sys import version import pandas as pd # Version 0.17.1 on Linux and Windows import numpy as np print(version) print(pd.version)
Arbitrary data set
columns = ['Name', 'Type', 'Age', 'Weight (kg)', 'Cuteness'] dataset = [['Snuggles', 'Cat', 5.2, 4.2, 9.7], ['Rex', 'Dog', 2.1, 12, 2.1], ['Mrs. Quiggleworth', 'Cat', 7.4, 3, 7], ['Squirmy', 'Snake', 1.1, 0.2, 0.1], ['Tarantula', 'Legs', 0.2, 0.01, -np.inf], ['Groucho', 'Dog', 6.9, 8, 5.1]]
df = pd.DataFrame(dataset, columns=columns).set_index(['Name'])
Works fine
print("String type - are any of the columns negative infinity?") neg_inf = (df == -np.inf) print(neg_inf.any(axis=1))
Convert 'type' to a categorical
df['Type'] = df['Type'].astype('category')
print("Categorical type - is the Type column negative infinity?") print(df['Type'] == -np.inf) # Works fine
print("Categorical type in dataframe - are any of them negative infinity?") print(df[['Type']] == -np.inf) # Danger, Will Robinson!
Expected Output
3.5.1 |Anaconda 2.5.0 (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
0.17.1
String type - are any of the columns negative infinity?
Name
Snuggles False
Rex False
Mrs. Quiggleworth False
Squirmy False
Tarantula True
Groucho False
dtype: bool
Categorical type - is the Type column negative infinity?
Name
Snuggles False
Rex False
Mrs. Quiggleworth False
Squirmy False
Tarantula False
Groucho False
Name: Type, dtype: bool
Categorical type in dataframe - are any of them negative infinity?
Traceback (most recent call last):
File "pandas_demo.py", line 30, in <module>
print(df[['Type']] == -np.inf) # Danger, Will Robinson!
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/ops.py", line 1115, in f
res = self._combine_const(other, func, raise_on_error=False)
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 3482, in _combine_const
new_data = self._data.eval(func=func, other=other, raise_on_error=raise_on_error)
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 2840, in eval
return self.apply('eval', **kwargs)
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 2823, in apply
applied = getattr(b, f)(**kwargs)
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 1155, in eval
fastpath=True,)]
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 169, in make_block
return make_block(values, placement=placement, ndim=ndim, **kwargs)
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 2454, in make_block
placement=placement)
File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 78, in __init__
raise ValueError('Wrong number of dimensions')
ValueError: Wrong number of dimensions
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: 1.3.7
pip: 8.0.3
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None