Add ddof to cov methods by blueprintarchitect · Pull Request #34611 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation52 Commits22 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

[ x] closes Feature Request: Population Covariance Calculation with ddof parameter for DataFrame.cov() #28823
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Hello @smartswdeveloper! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-21 21:12:36 UTC

Can you add tests? That's typically the first thing we look for in any PR

WillAyd changed the title~~ENH: fix #28823~~ Add ddof to cov methods

does corr will also need a change?

does corr will also need a change?

I don't think so, the degrees of freedom cancel for correlation. Looks like numpy.corrcoef does have a ddof argument but it doesn't do anything and is deprecated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a whatsnew note as well, other enhancements, 1.1


result = s1.cov(s2, ddof=test_ddof)
expected = np.cov(np_array1, np_array2, ddof=test_ddof)[0][1]
assert math.isclose(expected, result)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use assert_series_equal here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it returns a float. So the test fails with assert_series_equal. Is there any other option or should I convert into a Series?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t select out a single value
compare the result in its entirety

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I am struggling to see your point here.
In fact differently from Dataframe.cov, Series.cov returns a float and not the covariance matrix.
So the one compared is the entire result of Series.cov (float) with the covariance matrix value of s1,s2 (float).
Can you please help me to understand what you mean with entire result?

this needs a whatsnew note as well, other enhancements, 1.1

Done

- :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`).
- :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`).
- :meth:`Dataframe.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).
- :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this 1 note, just

:meth:`Dataframe.cov` and :meth:`Series.cov` .....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

min_periods : int, optional
Minimum number of observations needed to have a valid result.
ddof : int, default 1
Delta degrees of Freedom. The divisor used in calculations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta degrees of Freedom. The divisor used in calculations
Delta degrees of freedom. The divisor used in calculations

to have a valid result.

ddof : int, default 1
Delta degrees of Freedom. The divisor used in calculations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta degrees of Freedom. The divisor used in calculations
Delta degrees of freedom. The divisor used in calculations

- :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`).
- :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`).
- :meth:`~pandas.io.gbq.read_gbq` now supports the ``max_results`` kwarg from ``pandas-gbq`` (:issue:`34639`).
- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).
- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ``ddof`` to support delta degrees of freedom as in the corresponding ``numpy`` methods (:issue:`34611`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults here are backwards compatible, right?

- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support delta degrees of freedom as in the corresponding numpy methods (:issue:`34611`).
=======
- :meth:`DataFrame.to_html` and :meth:`DataFrame.to_string`'s ``col_space`` parameter now accepts a list of dict to change only some specific columns' width (:issue:`28917`).
>>>>>>> e6e08890cc8bd1b162b920cf5a526a433bab8b30

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clean these up from the merge conflict

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, pls ping on green.

)

def cov(self, other, min_periods=None) -> float:
def cov(self, other, min_periods=None, ddof=1) -> float:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add types here

lgtm. can merge on green.


def cov(self, other, min_periods=None) -> float:
def cov(
self, other: "Series", min_periods: Optional[int] = None, ddof: int = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self, other: "Series", min_periods: Optional[int] = None, ddof: int = 1
self, other: "Series", min_periods: Optional[int] = None, ddof: Optional[int] = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the official documentation https://docs.python.org/3/library/typing.html
it seems that the Optional should not be applied to optional parameters with a default (like ddof).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the docs mean is that an argument with a default shouldn't be given an Optional type because providing the value is optional. It should have an Optional type if None is a possible value:

On the other hand, if an explicit value of None is allowed, the use of Optional is appropriate, whether the argument is optional or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Numpy is implementing the protection for None and it's actually what is present in the test case. Made Optional

Labels