Use a more helpful error message for invalid correlation methods in DataFrame.corr by dsaxton · Pull Request #22298 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation21 Commits8 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
DataFrame.corr
currently returns a KeyError
for invalid correlation methods. The proposed change would instead return a ValueError
with an error message reminding the user of the valid correlation methods.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(size=(10, 3)))
df.corr(method="turkey")
Hello @dsaxton! Thanks for updating the PR.
Cheers ! There are no PEP8 issues in this Pull Request. 🍻
Comment last updated on August 14, 2018 at 15:49 Hours UTC
Thanks. Could you add a release note in 0.24.0, and a test to ensure that code is hit for bad inputs?
I'm working on some additional modifications to the DataFrame correlation methods to address some potential bugs I found and close another existing issue, is it okay if I close this pull request and reopen another with all the changes?
no pls don’t
put independent changes in independent PRs
daniel saxton and others added 4 commits
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test / fix this on Series as well.
@@ -6682,6 +6682,9 @@ def corr(self, method='pearson', min_periods=1): |
---|
c = corrf(ac, bc) |
correl[i, j] = c |
correl[j, i] = c |
else: |
raise ValueError("method must be either 'pearson', " |
"'spearman' or 'kendall'") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add ....{method} was supplied'.format(method=method)
@@ -130,6 +130,11 @@ def test_corr_cov_independent_index_column(self): |
---|
assert result.index is not result.columns |
assert result.index.equals(result.columns) |
def test_corr_invalid_method(self): |
df = pd.DataFrame(np.random.normal(size=(10, 2))) |
with pytest.raises(ValueError): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check this with an assert_raises_regex (e.g. match the error)
add the gh issue number as a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Is the convention here to use the most specific regex that matches the error message, or just one that is reasonably specific?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasonable is fine
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments. ping on green.
if method in ['pearson', 'spearman', 'kendall']: |
---|
return nanops.nancorr(this.values, other.values, method=method, |
min_periods=min_periods) |
else: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the else here
@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self): |
---|
assert result.index is not result.columns |
assert result.index.equals(result.columns) |
def test_corr_invalid_method(self): |
df = pd.DataFrame(np.random.normal(size=(10, 2))) |
pttrn = ("method must be either 'pearson', 'spearman', " |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pttrn -> msg
@@ -130,6 +130,13 @@ def test_corr_cov_independent_index_column(self): |
---|
assert result.index is not result.columns |
assert result.index.equals(result.columns) |
def test_corr_invalid_method(self): |
df = pd.DataFrame(np.random.normal(size=(10, 2))) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the gh issue number as a comment
@@ -778,6 +778,14 @@ def test_corr_rank(self): |
---|
tm.assert_almost_equal(A.corr(B, method='kendall'), kexp) |
tm.assert_almost_equal(A.corr(B, method='spearman'), sexp) |
def test_corr_invalid_method(self): |
s1 = pd.Series(np.random.randn(10)) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
def test_corr_invalid_method(self): |
---|
s1 = pd.Series(np.random.randn(10)) |
s2 = pd.Series(np.random.randn(10)) |
pttrn = ("method must be either 'pearson', 'spearman', " |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pttrn -> msg
@@ -468,6 +468,7 @@ Other API Changes |
---|
- :meth:`PeriodIndex.tz_convert` and :meth:`PeriodIndex.tz_localize` have been removed (:issue:`21781`) |
- :class:`Index` subtraction will attempt to operate element-wise instead of raising ``TypeError`` (:issue:`19369`) |
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`) |
- :meth:`DataFrame.corr` now raises a ``ValueError`` instead of a ``KeyError`` when supplied with an invalid method. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the PR number as we don't have an issue number
…mber to tests and release notes
@jreback Looks like the tests passed. Let me know if there are any other changes that should be made, if not, thanks for all your help.
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request