Categorical equality check raises ValueError in DataFrame · Issue #12564 · pandas-dev/pandas (original) (raw)

Apparently there's an issue when comparing the equality of a scalar value against a categorical column as part of a DataFrame. In the example below, I'm checking against -np.inf, but comparing to a string or integer gives the same results.

This raises ValueError: Wrong number of dimensions.

Code Sample

from sys import version import pandas as pd # Version 0.17.1 on Linux and Windows import numpy as np print(version) print(pd.version)

Arbitrary data set

columns = ['Name', 'Type', 'Age', 'Weight (kg)', 'Cuteness'] dataset = [['Snuggles', 'Cat', 5.2, 4.2, 9.7], ['Rex', 'Dog', 2.1, 12, 2.1], ['Mrs. Quiggleworth', 'Cat', 7.4, 3, 7], ['Squirmy', 'Snake', 1.1, 0.2, 0.1], ['Tarantula', 'Legs', 0.2, 0.01, -np.inf], ['Groucho', 'Dog', 6.9, 8, 5.1]]

df = pd.DataFrame(dataset, columns=columns).set_index(['Name'])

Works fine

print("String type - are any of the columns negative infinity?") neg_inf = (df == -np.inf) print(neg_inf.any(axis=1))

Convert 'type' to a categorical

df['Type'] = df['Type'].astype('category')

print("Categorical type - is the Type column negative infinity?") print(df['Type'] == -np.inf) # Works fine

print("Categorical type in dataframe - are any of them negative infinity?") print(df[['Type']] == -np.inf) # Danger, Will Robinson!

Expected Output

3.5.1 |Anaconda 2.5.0 (64-bit)| (default, Dec  7 2015, 11:16:01) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
0.17.1
String type - are any of the columns negative infinity?
Name
Snuggles             False
Rex                  False
Mrs. Quiggleworth    False
Squirmy              False
Tarantula             True
Groucho              False
dtype: bool
Categorical type - is the Type column negative infinity?
Name
Snuggles             False
Rex                  False
Mrs. Quiggleworth    False
Squirmy              False
Tarantula            False
Groucho              False
Name: Type, dtype: bool
Categorical type in dataframe - are any of them negative infinity?
Traceback (most recent call last):
  File "pandas_demo.py", line 30, in <module>
    print(df[['Type']] == -np.inf)  # Danger, Will Robinson!
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/ops.py", line 1115, in f
    res = self._combine_const(other, func, raise_on_error=False)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 3482, in _combine_const
    new_data = self._data.eval(func=func, other=other, raise_on_error=raise_on_error)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 2840, in eval
    return self.apply('eval', **kwargs)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 2823, in apply
    applied = getattr(b, f)(**kwargs)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 1155, in eval
    fastpath=True,)]
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 169, in make_block
    return make_block(values, placement=placement, ndim=ndim, **kwargs)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 2454, in make_block
    placement=placement)
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 78, in __init__
    raise ValueError('Wrong number of dimensions')
ValueError: Wrong number of dimensions

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.3
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None