ENH: Allow absolute precision in assert_almost_equal (#13357) by joaoleveiga · Pull Request #30562 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation75 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes #9457

Greetings! This is my first pull-request on an open source project, so I hope I did not miss anything at least too obvious... Thank you, in advance 😄

I do have a few questions:

Should I add the versionadded directive in all docstrings that were changed?
Which "whatsnew" release notes should I add this change to? v1.0.0?
Should assert_almost_equal / equals should allow access to np.allclose #9457 also be closed, as from my understanding this was also the underlying issue?
Should I be the one ticking the checkboxes below?

This fixes the issue where:

# Fails because it's doing (1 - .1 / .1001)
assert_almost_equal(0.1, 0.1001, check_less_precise=True)

# Works as intuitively expected
assert_almost_equal(
    0.1, 0.1001,
    atol=0.01,
)

Commit message below:

This commit makes assert_almost_equal accept both relative and
absolute precision when comparing numbers, through two new keyword
arguments: rtol, and atol, respectively.

Under the hood, _libs.testing.assert_almost_equal is now calling
math.isclose, instead of an adaptaion of
numpy.testing.assert_almost_equal.

closes assert_almost_equal #13357
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Just noticed that I added the issue number on the commit title, which I guess I was not supposed to. 🤔 Can I push-force to my branch and update it?

Just noticed that I added the issue number on the commit title, which I guess I was not supposed to.

Not an issue at all, don't worry about it.

Greetings! This is my first pull-request on an open source project,

Welcome!

Which "whatsnew" release notes should I add this change to? v1.0.0?

v1.0.0, correct

Should I be the one ticking the checkboxes below?

Yes

Should #9457 also be closed, as from my understanding this was also the underlying issue?

I haven't looked at that yet, but assuming you're right, then yes. also add another checkbox with a - [x] closes #9457

Hello @joaoleveiga! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-21 09:44:27 UTC

Should #9457 also be closed, as from my understanding this was also the underlying issue?

I haven't looked at that yet, but assuming you're right, then yes. also add another checkbox with a - [x] closes #9457

Looking back at this, it seems like the OP wanted a different issue solved, so I won't add the tag for that issue here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jreback added the Testing

pandas testing functions or related to the test suite

label

Jan 1, 2020

Thanks for the suggestion @jreback ! I was afraid of touching in too many parts of the code, but I definitely agree to having a similar API to numpy.allclose. Let's see how deprecation works 😉

I would rather:

deprecate check_less_precise

add rtol and atol

similar to https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html with the same defaults

this will be a bit more involved, but ultimately is better than adding unfamiliar keywords

Any suggestions to make these changes without breaking current functionality?

If the rtol and atol parameters are added with the same defaults, then it will be tricky to check if the user is requesting the "old behavior" with check_less_precise=True/False.

Should I worry about such a breaking change or is it OK?

Example of what I was doing:

def assert_almost_equal( left, right, check_dtype: Union[bool, str] = "equiv", check_less_precise: Union[bool, int] = False, rtol: float = 1e-5, atol: float = 1e-8, **kwargs, ): """(...) check_less_precise : bool or int, default False (...) .. deprecated:: 1.0.0 Use rtol and atol instead to define relative, and absolute tolerance, respectively. Similar to :func:numpy.allclose. rtol : float, default 1e-5 Relative tolerance. atol : float, default 1e-8 Absolute tolerance. """ ...

🤔

A few options I thought of:

With rtol/atol defined as Optional[float] one could check if the new args are defined first, and still keep the check_less_precise flag working as it is now.
Ditch the check_less_precise functionality while keeping the argument so that the deprecation warning is more explicit.

Any ideas?
Thanks!

you can just set the default for check_less_precise = lib.no_default (grep for it)

then if check_less_precise is passed at all you warn (and then set rtol if needed)

change all refs in the pandas codebase and should be good

After looking better at how numpy handles these sort of assertions with numpy.isclose, I realized that perhaps it would be cleaner to just use it, or call math.isclose instead.

WDYT? Does it make sense to replicate math.isclose in _libs.testing.pyx?

This would imply deprecating the _libs.testing.decimal_almost_equal function.

I'm going to commit the changes I made anyway, so that you can review it.

EDIT: BTW, looking at numpy/numpy#10161 has convinced me that going with the math.isclose algorithm seems like the best option. I'm just unsure on how to deal with the deprecation, to avoid being a nuisance to people that might be using the testing.assert_*_equal functions.

After looking better at how numpy handles these sort of assertions with numpy.isclose, I realized that perhaps it would be cleaner to just use it, or call math.isclose instead.

WDYT? Does it make sense to replicate math.isclose in _libs.testing.pyx?

This would imply deprecating the _libs.testing.decimal_almost_equal function.

I'm going to commit the changes I made anyway, so that you can review it.

EDIT: BTW, looking at numpy/numpy#10161 has convinced me that going with the math.isclose algorithm seems like the best option. I'm just unsure on how to deal with the deprecation, to avoid being a nuisance to people that might be using the testing.assert_*_equal functions.

yes not averse to using math.isclose here. I think we handle all of the NaN / different dtype stuff before then (that is of course why this was written original; numpy didn't handle anything except float/int properly), but that's where these tol's matter anyhow

Should I move the changes in the whatsnew file from the Enhancements section to the Deprecation section?

Meanwhile, I found some tests that use assert_almost_equal are breaking (and were already breaking prior to this change...). Should I try fixing them in this issue, or open a new one?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaoleveiga can you fix up merge conflicts? Someone can take a look after that

@joaoleveiga can you fix up merge conflicts? Someone can take a look after that

Conflicts solved.

Looking at the checks that failed, I was left with a few questions:

Should I add a new test under tests.util.test_assert_almost_equal to test that the FutureWarning is raised?
Should the warning issue a FutureWarning or a DeprecationWarning?
Should I add the @pytest.mark.filterwarnings("ignore:The 'check_less_precise':FutureWarning") decoration to all tests that make use of check_less_precise=? 😬

Thanks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add a new test under tests.util.test_assert_almost_equal to test that the FutureWarning is raised?

yes

Should the warning issue a FutureWarning or a DeprecationWarning?

FutureWarning as you have

Should I add the @pytest.mark.filterwarnings("ignore:The 'check_less_precise':FutureWarning") decoration to all tests that make use of check_less_precise=?

NO, just change all of the usages that need changing (likely not very many)

@@ -1217,13 +1347,15 @@ def assert_frame_equal(
check_index_type="equiv",
check_column_type="equiv",
check_frame_type=True,
check_less_precise=False,
check_less_precise=no_default,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use None here, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was just following your suggestion:

you can just set the default for check_less_precise = lib.no_default (grep for it)

Which seemed a better fit when deprecating a parameter. Should I change it to None or keep with the no_default? 😄

Do we have internal use cases for this, or are we just/mostly providing it for downstream authors/users?

@joaoleveiga looks like a few merge conflicts have built up - can you resolve them?

@joaoleveiga looks like a few merge conflicts have built up - can you resolve them?

Done! Sorry for the delay, but these past weeks were a bit intense here (Portugal).

this was pretty close last time i looked. can you merge master

this was pretty close last time i looked. can you merge master

@jreback , just merged master. Locally I'm getting some errors on the datetime tests which seem to be somewhat related to locale definitions. Other than that, other tests passed 👌

@joaoleveiga possible typing issue: https://github.com/pandas-dev/pandas/pull/30562/checks?check_run_id=772917451#step:11:10, can you check?
Performing static analysis using mypy
pandas/_testing.py:1183: error: "ExtensionArray" has no attribute "asi8"
I have looked at that and I still don't understand why it's triggering. I didn't change that piece of code.
I can add an additional hasattr(right, "asi8") if it's OK for you, even though it's not really related to the original issue/PR.

WDYT? @TomAugspurger , @jreback

if hasattr(left, "asi8") and hasattr(right, "asi8") and type(right) == type(left):

can you repro locally? I think something like
assert isinstance(left, DatetimeLikeArrayMixin) (and same for right) will work here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green @jreback. I think the docs could use some expanding, but that can be done as a followup.

Use `rtol` and `atol` instead to define relative/absolute
tolerance, respectively. Similar to :func:`math.isclose`.
rtol : float, default 1e-5
Relative tolerance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user-facing documentation, so these descriptions could be improved.

Could you copy the description from numpy.isclose, or link to it with a See Also?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor doc comment & I think @TomAugspurger has some, but as a followon is ok. Thanks for doing this PR, very nice!

@@ -781,6 +781,9 @@ Deprecations
- The ``squeeze`` keyword in the ``groupby`` function is deprecated and will be removed in a future version (:issue:`32380`)
- The ``tz`` keyword in :meth:`Period.to_timestamp` is deprecated and will be removed in a future version; use `per.to_timestamp(...).tz_localize(tz)`` instead (:issue:`34522`)
- :meth:`DatetimeIndex.to_perioddelta` is deprecated and will be removed in a future version. Use ``index - index.to_period(freq).to_timestamp()`` instead (:issue:`34853`)
- :meth:`util.testing.assert_almost_equal` now accepts both relative and absolute
precision through the ``rtol``, and ``atol`` parameters, thus deprecating the
``check_less_precise`` parameter. (:issue:`13357`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also indicate #9457 here (for a followon PR is fine)

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request

Jun 27, 2020

@joaoleveiga want to make a PR to enforce this deprecation?

@jbrockmendel , I only had time to tackle this today but I see it was already taken care of :) thanks for the heads up. Cheers

Labels

Testing

pandas testing functions or related to the test suite