Add ddof to cov methods by blueprintarchitect · Pull Request #34611 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation52 Commits22 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@blueprintarchitect

@blueprintarchitect

@pep8speaks

Hello @smartswdeveloper! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-21 21:12:36 UTC

@blueprintarchitect

@blueprintarchitect

@blueprintarchitect

@WillAyd

Can you add tests? That's typically the first thing we look for in any PR

@WillAyd WillAyd changed the titleENH: fix #28823 Add ddof to cov methods

Jun 5, 2020

dsaxton

@blueprintarchitect

@blueprintarchitect

@blueprintarchitect

dsaxton

@rhoit

does corr will also need a change?

@dsaxton

does corr will also need a change?

I don't think so, the degrees of freedom cancel for correlation. Looks like numpy.corrcoef does have a ddof argument but it doesn't do anything and is deprecated.

@blueprintarchitect

@blueprintarchitect

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a whatsnew note as well, other enhancements, 1.1

result = s1.cov(s2, ddof=test_ddof)
expected = np.cov(np_array1, np_array2, ddof=test_ddof)[0][1]
assert math.isclose(expected, result)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use assert_series_equal here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it returns a float. So the test fails with assert_series_equal. Is there any other option or should I convert into a Series?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t select out a single value
compare the result in its entirety

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I am struggling to see your point here.
In fact differently from Dataframe.cov, Series.cov returns a float and not the covariance matrix.
So the one compared is the entire result of Series.cov (float) with the covariance matrix value of s1,s2 (float).
Can you please help me to understand what you mean with entire result?

@blueprintarchitect

@blueprintarchitect

@blueprintarchitect

this needs a whatsnew note as well, other enhancements, 1.1

Done

jreback

- :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`).
- :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`).
- :meth:`Dataframe.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).
- :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this 1 note, just

:meth:`Dataframe.cov` and :meth:`Series.cov` .....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@blueprintarchitect

@blueprintarchitect

dsaxton

min_periods : int, optional
Minimum number of observations needed to have a valid result.
ddof : int, default 1
Delta degrees of Freedom. The divisor used in calculations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta degrees of Freedom. The divisor used in calculations
Delta degrees of freedom. The divisor used in calculations
to have a valid result.
ddof : int, default 1
Delta degrees of Freedom. The divisor used in calculations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta degrees of Freedom. The divisor used in calculations
Delta degrees of freedom. The divisor used in calculations
- :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`).
- :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`).
- :meth:`~pandas.io.gbq.read_gbq` now supports the ``max_results`` kwarg from ``pandas-gbq`` (:issue:`34639`).
- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support Delta degrees of Freedom as in the corresponding numpy methods (:issue:`34611`).
- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ``ddof`` to support delta degrees of freedom as in the corresponding ``numpy`` methods (:issue:`34611`).

@blueprintarchitect

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults here are backwards compatible, right?

@blueprintarchitect

@blueprintarchitect

dsaxton

- :meth:`Dataframe.cov` and :meth:`Series.cov` now support a new parameter ddof to support delta degrees of freedom as in the corresponding numpy methods (:issue:`34611`).
=======
- :meth:`DataFrame.to_html` and :meth:`DataFrame.to_string`'s ``col_space`` parameter now accepts a list of dict to change only some specific columns' width (:issue:`28917`).
>>>>>>> e6e08890cc8bd1b162b920cf5a526a433bab8b30

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clean these up from the merge conflict

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@blueprintarchitect

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, pls ping on green.

)
def cov(self, other, min_periods=None) -> float:
def cov(self, other, min_periods=None, ddof=1) -> float:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add types here

@blueprintarchitect

@blueprintarchitect

dsaxton

jreback

@jreback

lgtm. can merge on green.

dsaxton

def cov(self, other, min_periods=None) -> float:
def cov(
self, other: "Series", min_periods: Optional[int] = None, ddof: int = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self, other: "Series", min_periods: Optional[int] = None, ddof: int = 1
self, other: "Series", min_periods: Optional[int] = None, ddof: Optional[int] = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the official documentation https://docs.python.org/3/library/typing.html
it seems that the Optional should not be applied to optional parameters with a default (like ddof).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the docs mean is that an argument with a default shouldn't be given an Optional type because providing the value is optional. It should have an Optional type if None is a possible value:

On the other hand, if an explicit value of None is allowed, the use of Optional is appropriate, whether the argument is optional or not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Numpy is implementing the protection for None and it's actually what is present in the test case. Made Optional

@blueprintarchitect

@WillAyd

Labels