Add ddof to cov methods by blueprintarchitect · Pull Request #34611 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation52 Commits22 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- [ x] closes Feature Request: Population Covariance Calculation with ddof parameter for DataFrame.cov() #28823
- tests added / passed
- passes
black pandas - passes
git diff upstream/master -u -- "*.py" | flake8 --diff - whatsnew entry
Hello @smartswdeveloper! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻
Comment last updated at 2020-06-21 21:12:36 UTC
Can you add tests? That's typically the first thing we look for in any PR
WillAyd changed the title
ENH: fix #28823 Add ddof to cov methods
does corr will also need a change?
does corr will also need a change?
I don't think so, the degrees of freedom cancel for correlation. Looks like numpy.corrcoef does have a ddof argument but it doesn't do anything and is deprecated.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a whatsnew note as well, other enhancements, 1.1
| result = s1.cov(s2, ddof=test_ddof) |
| expected = np.cov(np_array1, np_array2, ddof=test_ddof)[0][1] |
| assert math.isclose(expected, result) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use assert_series_equal here
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it returns a float. So the test fails with assert_series_equal. Is there any other option or should I convert into a Series?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don’t select out a single value
compare the result in its entirety
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I am struggling to see your point here.
In fact differently from Dataframe.cov, Series.cov returns a float and not the covariance matrix.
So the one compared is the entire result of Series.cov (float) with the covariance matrix value of s1,s2 (float).
Can you please help me to understand what you mean with entire result?
this needs a whatsnew note as well, other enhancements, 1.1
Done
| - :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`). |
|---|
| - :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`). |
| - :meth:`Dataframe.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`). |
| - :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this 1 note, just
:meth:`Dataframe.cov` and :meth:`Series.cov` .....
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| min_periods : int, optional |
|---|
| Minimum number of observations needed to have a valid result. |
| ddof : int, default 1 |
| Delta degrees of Freedom. The divisor used in calculations |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Delta degrees of Freedom. The divisor used in calculations |
|---|
| Delta degrees of freedom. The divisor used in calculations |
| to have a valid result. |
|---|
| ddof : int, default 1 |
| Delta degrees of Freedom. The divisor used in calculations |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Delta degrees of Freedom. The divisor used in calculations |
|---|
| Delta degrees of freedom. The divisor used in calculations |
| - :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`). |
|---|
| - :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`). |
| - :meth:`~pandas.io.gbq.read_gbq` now supports the ``max_results`` kwarg from ``pandas-gbq`` (:issue:`34639`). |
| - :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`). |
|---|
| - :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ``ddof`` to support delta degrees of freedom as in the corresponding ``numpy`` methods (:issue:`34611`). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults here are backwards compatible, right?
| - :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support delta degrees of freedom as in the corresponding numpy methods (:issue:`34611`). |
|---|
| ======= |
| - :meth:`DataFrame.to_html` and :meth:`DataFrame.to_string`'s ``col_space`` parameter now accepts a list of dict to change only some specific columns' width (:issue:`28917`). |
| >>>>>>> e6e08890cc8bd1b162b920cf5a526a433bab8b30 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to clean these up from the merge conflict
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, pls ping on green.
| ) |
|---|
| def cov(self, other, min_periods=None) -> float: |
| def cov(self, other, min_periods=None, ddof=1) -> float: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add types here
lgtm. can merge on green.
| def cov(self, other, min_periods=None) -> float: |
| def cov( |
| self, other: "Series", min_periods: Optional[int] = None, ddof: int = 1 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self, other: "Series", min_periods: Optional[int] = None, ddof: int = 1 |
|---|
| self, other: "Series", min_periods: Optional[int] = None, ddof: Optional[int] = 1 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the official documentation https://docs.python.org/3/library/typing.html
it seems that the Optional should not be applied to optional parameters with a default (like ddof).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the docs mean is that an argument with a default shouldn't be given an Optional type because providing the value is optional. It should have an Optional type if None is a possible value:
On the other hand, if an explicit value of None is allowed, the use of Optional is appropriate, whether the argument is optional or not.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. Numpy is implementing the protection for None and it's actually what is present in the test case. Made Optional