Bug in xs raising KeyError for MultiIndex columns with droplevel False and list indexe by phofl · Pull Request #41789 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation23 Commits8 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

phofl

@phofl

@phofl phofl added Indexing

Related to indexing on series/frames, not to indexes themselves

Regression

Functionality that used to work in a prior pandas version

labels

Jun 2, 2021

@simonjayhawkins

cc @simonjayhawkins Should we include in 1.2.5 or push to 1.3?

this seems backportable. will run a bisect on the issue to get some more context.

@simonjayhawkins

@phofl is this a bug or is the user expecting output that may have been incorrect in 1.1.5.

IIUC key is an index label or tuple of index label.

if a tuple, represents more than one level of the index to match.

a list should not be an allowable input. As a sequence should it be treated as a tuple and raise KeyError as it does now in 1.2.x/master.

or should we validate the key and raise TypeError?

@phofl

Good point, thanks for bringing this up. You are correct this should not work at all.

In general I think a list is counterintuitive as an input, because in loc/iloc we treat list as single level indexers, so we should disallow them? We have one test using this, but there are two levels given, so it is implicitly clear, that the user tries to use the list elements on different levels df.xs([2008, "sat"], level=["year", "day"], drop_level=False).

So for the reason of backwarts compatibility we should probably raise, if a list is given and the number of levels does not match the length of the list (like in this case)?

I am not really happy with the list case above, but we probably have to deprecate instead of removing outright.

@jreback

Good point, thanks for bringing this up. You are correct this should not work at all.

In general I think a list is counterintuitive as an input, because in loc/iloc we treat list as single level indexers, so we should disallow them? We have one test using this, but there are two levels given, so it is implicitly clear, that the user tries to use the list elements on different levels df.xs([2008, "sat"], level=["year", "day"], drop_level=False).

So for the reason of backwarts compatibility we should probably raise, if a list is given and the number of levels does not match the length of the list (like in this case)?

I am not really happy with the list case above, but we probably have to deprecate instead of removing outright.

I agree that we should remove this case. Can you add a deprecation warning in this cse? if so then ok to backport. (.xs is itself kind of janky anyhow), though we don't allow drop_level via .loc.....

@jreback

@jbrockmendel

We have one test using this, but there are two levels given, so it is implicitly clear, that the user tries to use the list elements on different levels df.xs([2008, "sat"], level=["year", "day"], drop_level=False).

So the idea is that we would deprecate allowing this so the user would need to pass either (2008, "sat") or [(2008, "sat")]?

though we don't allow drop_level via .loc.....

xref #35418 but i dont think there's any appetite for making loc more complicated

@phofl

Yes, disallowing lists completly, e.g. user has to pass (2008, "sat").

@jbrockmendel

Yes, disallowing lists completly, e.g. user has to pass (2008, "sat").

im on board

@phofl

Will repurpose this pr with the changes

@phofl

@phofl

� Conflicts: � doc/source/whatsnew/v1.3.0.rst

@phofl

Deprecated list as key with same length as level and raised when key has different length than level.

Added the whatsnew to 1.3. Would move if we want to backport this?

@jreback

Deprecated list as key with same length as level and raised when key has different length than level.

Added the whatsnew to 1.3. Would move if we want to backport this?

this is ok to backport i think

jreback

if isinstance(key, list):
len_levels = 1 if level is None or is_scalar(level) else len(level)
if len(key) != len_levels:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this break any existing tests?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test_xs_partial

Could deprecate this too, if you prefer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's just deprecate entirely (prob easier to remember too)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@phofl

@phofl

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback

@jreback

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request

Jun 4, 2021

@phofl @meeseeksmachine

…ex columns with droplevel False and list indexe

@lumberbot-app

Something went wrong ... Please have a look at my logs.

simonjayhawkins

~~~~~~~~~~~~
- Deprecated passing lists as ``key`` to :meth:`DataFrame.xs` and :meth:`Series.xs` (:issue:`41760`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can backport this since it violates our policy.

pandas will introduce deprecations in **minor** releases. These deprecations
will preserve the existing behavior while emitting a warning that provide
guidance on:

* How to achieve similar behavior if an alternative is available
* The pandas version in which the deprecation will be enforced.

We will not introduce new deprecations in patch releases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make a pr later to move back to 1.3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. just need to move the release note on master. I'll close the backport PR.

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request

Jul 3, 2021

@phofl @JulianWgs

Labels

Indexing

Related to indexing on series/frames, not to indexes themselves

Regression

Functionality that used to work in a prior pandas version