API: better error-handling for df.set_index · Issue #22484 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Description

splitting up #22236.

Let's have
df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))

The error handling of df.set_index can be improved in at least three cases:

df.set_index(['A', 'A'], drop=False) works, while
df.set_index(['A', 'A'], drop=True) yields
KeyError: 'A'
Objects of unknown type yield KeyError instead of TypeError:
df.set_index(map(str, df.A))
KeyError: "None of [Index([...], dtype='object')] are in the [columns]"
df.set_index(['foo', 'bar', 'baz']) only shows one missing key
KeyError: 'foo' (in a huge stacktrace)

Better would be:

gracefully handle duplicate column names when drop=True
raise better error message, e.g. TypeError: only allowed types are: ...
Show all missing keys: KeyError: "['foo', 'bar', 'baz']"