Use a more helpful error message for invalid correlation methods in DataFrame.corr by dsaxton · Pull Request #22298 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation21 Commits8 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

dsaxton

DataFrame.corr currently returns a KeyError for invalid correlation methods. The proposed change would instead return a ValueError with an error message reminding the user of the valid correlation methods.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(10, 3)))
df.corr(method="turkey")

@pep8speaks

Hello @dsaxton! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 14, 2018 at 15:49 Hours UTC

@codecov

@TomAugspurger

Thanks. Could you add a release note in 0.24.0, and a test to ensure that code is hit for bad inputs?

@dsaxton

I'm working on some additional modifications to the DataFrame correlation methods to address some potential bugs I found and close another existing issue, is it okay if I close this pull request and reopen another with all the changes?

@jreback

no pls don’t

put independent changes in independent PRs

daniel saxton and others added 4 commits

August 13, 2018 09:02

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test / fix this on Series as well.

@@ -6682,6 +6682,9 @@ def corr(self, method='pearson', min_periods=1):
c = corrf(ac, bc)
correl[i, j] = c
correl[j, i] = c
else:
raise ValueError("method must be either 'pearson', "
"'spearman' or 'kendall'")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add ....{method} was supplied'.format(method=method)

@@ -130,6 +130,11 @@ def test_corr_cov_independent_index_column(self):
assert result.index is not result.columns
assert result.index.equals(result.columns)
def test_corr_invalid_method(self):
df = pd.DataFrame(np.random.normal(size=(10, 2)))
with pytest.raises(ValueError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check this with an assert_raises_regex (e.g. match the error)

add the gh issue number as a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Is the convention here to use the most specific regex that matches the error message, or just one that is reasonably specific?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable is fine

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments. ping on green.

if method in ['pearson', 'spearman', 'kendall']:
return nanops.nancorr(this.values, other.values, method=method,
min_periods=min_periods)
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the else here

@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self):
assert result.index is not result.columns
assert result.index.equals(result.columns)
def test_corr_invalid_method(self):
df = pd.DataFrame(np.random.normal(size=(10, 2)))
pttrn = ("method must be either 'pearson', 'spearman', "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pttrn -> msg

@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self):
assert result.index is not result.columns
assert result.index.equals(result.columns)
def test_corr_invalid_method(self):
df = pd.DataFrame(np.random.normal(size=(10, 2)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the gh issue number as a comment

@@ -778,6 +778,14 @@ def test_corr_rank(self):
tm.assert_almost_equal(A.corr(B, method='kendall'), kexp)
tm.assert_almost_equal(A.corr(B, method='spearman'), sexp)
def test_corr_invalid_method(self):
s1 = pd.Series(np.random.randn(10))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

def test_corr_invalid_method(self):
s1 = pd.Series(np.random.randn(10))
s2 = pd.Series(np.random.randn(10))
pttrn = ("method must be either 'pearson', 'spearman', "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pttrn -> msg

@@ -468,6 +468,7 @@ Other API Changes
- :meth:`PeriodIndex.tz_convert` and :meth:`PeriodIndex.tz_localize` have been removed (:issue:`21781`)
- :class:`Index` subtraction will attempt to operate element-wise instead of raising ``TypeError`` (:issue:`19369`)
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
- :meth:`DataFrame.corr` now raises a ``ValueError`` instead of a ``KeyError`` when supplied with an invalid method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the PR number as we don't have an issue number

…mber to tests and release notes

@dsaxton

@jreback Looks like the tests passed. Let me know if there are any other changes that should be made, if not, thanks for all your help.

@TomAugspurger

@dsaxton

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request

Oct 1, 2018