Regression in DataFrame.set_index with class instance column keys · Issue #24969 · pandas-dev/pandas (original) (raw)
The following code worked in Pandas 0.23.4 but not in Pandas 0.24.0 (I'm on Python 3.7.2).
import pandas as pd
class Thing: # (Production code would also ensure a Thing instance's hash # and equality testing depended on name and color)
def __init__(self, name, color):
self.name = name
self.color = color
def __str__(self):
return "<Thing %r>" % (self.name,)
thing1 = Thing('One', 'red') thing2 = Thing('Two', 'blue') df = pd.DataFrame({thing1: [0, 1], thing2: [2, 3]}) df.set_index([thing2])
In Pandas 0.23.4, I get the following correct result:
<Thing 'One'>
<Thing 'Two'>
2 0
3 1
In Pandas 0.24.0, I get the following error:
Traceback (most recent call last): File "", line 1, in File ".../venv/lib/python3.7/site-packages/pandas/core/frame.py", line 4153, in set_index raise ValueError(err_msg) ValueError: The parameter "keys" may be a column key, one-dimensional array, or a list containing only valid column keys and one-dimensional arrays.
After looking at Pandas 0.24.0's implementation of DataFrame.set_index
:
for col in keys: |
---|
if (is_scalar(col) or isinstance(col, tuple)): |
# if col is a valid column key, everything is fine |
# tuples are always considered keys, never as list-likes |
if col not in self: |
missing.append(col) |
elif (not isinstance(col, (ABCIndexClass, ABCSeries, |
np.ndarray, list)) |
or getattr(col, 'ndim', 1) > 1): |
raise ValueError(err_msg) |
I noticed that is_scalar
returns False
for thing1
in Pandas 0.24.0:
from pandas.core.dtypes.common import is_scalar is_scalar(thing1) False
I suspect that it is incorrect to test DataFrame column keys using is_scalar
.
Output of pd.show_versions()
pd.show_versions()
from Pandas 0.23.4
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.4.3
Cython: None
numpy: 1.16.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.1.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
pd.show_versions()
from Pandas 0.24.0
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.0
pytest: None
pip: 18.1
setuptools: 40.4.3
Cython: None
numpy: 1.16.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.1.2
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None