Regression in DataFrame.set_index with class instance column keys · Issue #24969 · pandas-dev/pandas (original) (raw)

The following code worked in Pandas 0.23.4 but not in Pandas 0.24.0 (I'm on Python 3.7.2).

import pandas as pd

class Thing: # (Production code would also ensure a Thing instance's hash # and equality testing depended on name and color)

def __init__(self, name, color):
    self.name = name
    self.color = color

def __str__(self):
    return "<Thing %r>" % (self.name,)

thing1 = Thing('One', 'red') thing2 = Thing('Two', 'blue') df = pd.DataFrame({thing1: [0, 1], thing2: [2, 3]}) df.set_index([thing2])

In Pandas 0.23.4, I get the following correct result:

               <Thing 'One'>
<Thing 'Two'>               
2                          0
3                          1

In Pandas 0.24.0, I get the following error:

Traceback (most recent call last): File "", line 1, in File ".../venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4153, in set_index raise ValueError(err_msg) ValueError: The parameter "keys" may be a column key, one-dimensional array, or a list containing only valid column keys and one-dimensional arrays.

After looking at Pandas 0.24.0's implementation of DataFrame.set_index:

for col in keys:
if (is_scalar(col) or isinstance(col, tuple)):
# if col is a valid column key, everything is fine
# tuples are always considered keys, never as list-likes
if col not in self:
missing.append(col)
elif (not isinstance(col, (ABCIndexClass, ABCSeries,
np.ndarray, list))
or getattr(col, 'ndim', 1) > 1):
raise ValueError(err_msg)

I noticed that is_scalar returns False for thing1 in Pandas 0.24.0:

from pandas.core.dtypes.common import is_scalar is_scalar(thing1) False

I suspect that it is incorrect to test DataFrame column keys using is_scalar.

Output of pd.show_versions()

pd.show_versions() from Pandas 0.23.4

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.4.3
Cython: None
numpy: 1.16.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.1.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

pd.show_versions() from Pandas 0.24.0

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: None
pip: 18.1
setuptools: 40.4.3
Cython: None
numpy: 1.16.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.1.2
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None