API: better error-handling for df.set_index · Issue #22484 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@h-vetinari

Description

@h-vetinari

splitting up #22236.

Let's have
df = pd.DataFrame(np.random.randn(5, 5), columns=list('ABCDE'))

The error handling of df.set_index can be improved in at least three cases:

  1. df.set_index(['A', 'A'], drop=False) works, while
    df.set_index(['A', 'A'], drop=True) yields
    KeyError: 'A'
  2. Objects of unknown type yield KeyError instead of TypeError:
    df.set_index(map(str, df.A))
    KeyError: "None of [Index([...], dtype='object')] are in the [columns]"
  3. df.set_index(['foo', 'bar', 'baz']) only shows one missing key
    KeyError: 'foo' (in a huge stacktrace)

Better would be:

  1. gracefully handle duplicate column names when drop=True
  2. raise better error message, e.g. TypeError: only allowed types are: ...
  3. Show all missing keys: KeyError: "['foo', 'bar', 'baz']"