ENH: Introduce UnsortedIndexError GH11897 by Dr-Irv · Pull Request #14762 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes #11897
closes #14491

tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Current coverage is 85.28% (diff: 100%)

Merging #14762 into master will increase coverage by <.01%

@@ master #14762 diff @@

Files 144 144
Lines 50968 50970 +2
Methods 0 0
Messages 0 0
Branches 0 0

Hits 43466 43469 +3
Misses 7502 7501 -1
Partials 0 0

Powered by Codecov. Last update 2466ecb...c4e7a2a

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an
unsorted index (:issue:'11897`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specificaly on MultiIndex? you are closing 2 issues, is there a second note (or you can list 2 for this one if applicable)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've fixed that. When I did the second issue, I forgot to update whatsnew.

mask = isnull(result._values)
if mask.any():
raise IndexingError('Unalignable boolean Series key provided')
raise IndexingError('Unalignable labels in boolean Series index')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add something like "{}".format(type(key)) (e.g. make more useful)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added '{}'.format(type(key.index)) to print the index instead

import pandas.indexes.base as ibase


class UnsortedIndexError(KeyError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to pandas.core.common

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def test_unsortedindex(self):
# GH 11897
mi = pd.MultiIndex.from_tuples([('z', 'a'), ('x', 'a'), ('y', 'b'),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to find places where currently a KeyError is raised and change them in the tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now done that. My way of doing that was to make UnsortedIndexError a subclass of ValueError, run nosetests to see where it complained, and then change those tests accordingly. Then I switched UnsortedIndexError back to being a subclass of KeyError.



class UnsortedIndexError(KeyError):
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.assertEqual(df.index.lexsort_depth, 0)
with tm.assertRaisesRegexp(
KeyError,
UnsortedIndexError,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm only these 2 occurences of the original KeyError.....?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via my methodology described above, that's all I found.


- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give a mini-example of this (or put a ref-link back to the docs). speaking of the docs, I think we need some updating to say what happens when this is raised (e.g. its prob still KeyError )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a link to the doc section, and fixed that part of the docs.

unsorted MultiIndex (:issue:'11897`)

- Change message when indexing via a boolean ``Series`` that has an incompatible index (:issue:`14491`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not clear what you are fixing / changing. be a bit more verbose

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done that in my latest commit.


These are new features and improvements of note in each release.

.. include:: whatsnew/v0.20.0.txt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls revert this. we will update this after 0.19.2 comes out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`

- Change error message text corresponding to error raised when indexing via a
boolean ``Series`` that has an incompatible index (:issue:`14491`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shortened the item. I think it should be in whatsnew to refer back to the issue. But you guys can decide!


- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing/slicing into an

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thrown -> raise

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

class UnsortedIndexError(KeyError):
""" This error is raised when attempting to get a slice of a MultiIndex
and the index has not been lexsorted.
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versiononadded tag here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Main remaining question: since we announce this as useful to catch this as a different error as KeyError, we need a public API for this.
Currently, it is located in pandas.core.common. Is that where we want to expose this?

of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`

- Change error message text corresponding to error raised when indexing via a
boolean ``Series`` that has an incompatible index (:issue:`14491`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.

class UnsortedIndexError(KeyError):
""" This error is raised when attempting to get a slice of a MultiIndex
and the index has not been lexsorted.
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

if mask.any():
raise IndexingError('Unalignable boolean Series key provided')
raise IndexingError('Unalignable labels in boolean Series index '
'{}'.format(key.index))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this will give a nice result (adding the key's index in the error message, the rewording is certainly an improvement!)

Can you show an example and its resulting error message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to the Other API Changes section

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Where do you want me to put the example? In the whatsnew document?

@jreback I've moved the text to the Other API Changes section.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I just meant here in the comments. To see how it looks, as I am not sure if I find it a good idea to include the full index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Here is an example:

Unalignable labels in boolean Series index Int64Index([3, 2, 1, 0], dtype='int64')

Note that I added the key's index because of a suggestion @jreback made earlier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suggestion of @jreback what to include type(key.index) instead of key.index. But anyway, I personally think that this full index will obfuscate the error message a bit, as the index can be typically much longer.
Other proposal to make the message more clear:

Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)

What do you think of that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I've used your text. Committed change after doing a rebase. I think @jreback can merge now.

i think we can expose errors in

pandas.api.errors

but will need to do in another PR (need tests i think)

and a deprecation message if importerd
from core.common (which i think will work)

and a deprecation message if importerd

Well, there was a PR from gfyoung couple weeks ago about deprecating an exception, and turned out to be not that easy.
That's also the reason that I bring it up here, so we can do it right from the start. But for me it's OK to do in separate PR, as long as it is slated for 0.20

I think we can simply use pandas.util.depr_module with specific exceptions listed, no?

You basically deprecate them on first use, IOW simply importing them.

@jreback: we had a similar issue with CParserError, and unfortunately, I don't think my depr_module applies in either case because it doesn't deprecate a class.

@gfyoung how so? CParserError a class. In fact I would assert it doesn't matter what it deprecates, all it cares about is getting a name from a namespace.

@jreback : But you can't really deprecate KeyError in this case, so I'm actually not sure what you're trying to achieve with depr_module in the first place.

@gfyoung pls read the issue. I am going to deprecate the exceptions in pandas.core.common and move them to pandas.api.exceptions.

@jreback : That's not in the original issue IINM, but in case, sure that should work what you're describing then.

@jreback Last commit is the rebase and fixes the conflicting file, so not sure what to do next

@Dr-Irv Can you just rebase? Then it should be good to merge

@Dr-Irv As you can see in the github interface, this cannot be merged due to conflicts (but it is possible this was only introduced since your last commit).

@jorisvandenbossche I'm confused. I can't find the conflicts it is reporting. I did the following:

git checkout master
git pull
git checkout <branch>
git merge master

git reported that everything was up to date. So there were no conflicts reported. I'm not sure what to do.

@Dr-Irv depending on your settings, I think git pull will fetch your latest 'origin' (your fork), where you want to fetch the latest upstream.
What I typically do is

git fetch upstream
git checkout <branch>
git rebase upstream/master

(but the last line can also be git merge upstream/master if you want, that doesn't matter too much)

@Dr-Irv There are still some linting errors (hence the failing travis, see at the bottom of this log: https://travis-ci.org/pandas-dev/pandas/jobs/182331460):

pandas/core/common.py:108:1: W293 blank line contains whitespace
pandas/core/common.py:111:1: E303 too many blank lines (3)
pandas/core/indexing.py:1817:80: E501 line too long (81 > 79 characters)
pandas/core/indexing.py:1818:80: E501 line too long (82 > 79 characters)
pandas/tests/indexing/test_indexing.py:3487:80: E501 line too long (80 > 79 characters)

@jorisvandenbossche I've now fixed those. For some reason flake8 --diff didn't pick up those errors.

@jorisvandenbossche So I resubmitted, but now the appveyor is failing, but looking at the logs, I can't figure out why.

Don't worry about appveyor, it's not related.

Merging, thanks a lot!

yarikoptic added a commit to neurodebian/pandas that referenced this pull request

commit 'v0.19.0-174-g81a2f79': (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...

yarikoptic added a commit to neurodebian/pandas that referenced this pull request

release 0.19.1 was from release branch

releases: (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...

ischurov pushed a commit to ischurov/pandas that referenced this pull request

Labels