ENH: Allow absolute precision in assert_almost_equal (#13357) by joaoleveiga · Pull Request #30562 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation75 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

joaoleveiga

closes #9457

Greetings! This is my first pull-request on an open source project, so I hope I did not miss anything at least too obvious... Thank you, in advance 😄

I do have a few questions:

This fixes the issue where:

# Fails because it's doing (1 - .1 / .1001)
assert_almost_equal(0.1, 0.1001, check_less_precise=True)

# Works as intuitively expected
assert_almost_equal(
    0.1, 0.1001,
    atol=0.01,
)

Commit message below:

This commit makes assert_almost_equal accept both relative and
absolute precision when comparing numbers, through two new keyword
arguments: rtol, and atol, respectively.

Under the hood, _libs.testing.assert_almost_equal is now calling
math.isclose, instead of an adaptaion of
numpy.testing.assert_almost_equal.

@joaoleveiga

Just noticed that I added the issue number on the commit title, which I guess I was not supposed to. 🤔 Can I push-force to my branch and update it?

@jbrockmendel

Just noticed that I added the issue number on the commit title, which I guess I was not supposed to.

Not an issue at all, don't worry about it.

@jbrockmendel

Greetings! This is my first pull-request on an open source project,

Welcome!

Which "whatsnew" release notes should I add this change to? v1.0.0?

v1.0.0, correct

Should I be the one ticking the checkboxes below?

Yes

Should #9457 also be closed, as from my understanding this was also the underlying issue?

I haven't looked at that yet, but assuming you're right, then yes. also add another checkbox with a - [x] closes #9457

jbrockmendel

@pep8speaks

Hello @joaoleveiga! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-21 09:44:27 UTC

@joaoleveiga

Should #9457 also be closed, as from my understanding this was also the underlying issue?

I haven't looked at that yet, but assuming you're right, then yes. also add another checkbox with a - [x] closes #9457

Looking back at this, it seems like the OP wanted a different issue solved, so I won't add the tag for that issue here.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback jreback added the Testing

pandas testing functions or related to the test suite

label

Jan 1, 2020

@joaoleveiga

Thanks for the suggestion @jreback ! I was afraid of touching in too many parts of the code, but I definitely agree to having a similar API to numpy.allclose. Let's see how deprecation works 😉

@joaoleveiga

I would rather:

similar to https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html with the same defaults

this will be a bit more involved, but ultimately is better than adding unfamiliar keywords

Any suggestions to make these changes without breaking current functionality?

If the rtol and atol parameters are added with the same defaults, then it will be tricky to check if the user is requesting the "old behavior" with check_less_precise=True/False.

Should I worry about such a breaking change or is it OK?

Example of what I was doing:

def assert_almost_equal( left, right, check_dtype: Union[bool, str] = "equiv", check_less_precise: Union[bool, int] = False, rtol: float = 1e-5, atol: float = 1e-8, **kwargs, ): """(...) check_less_precise : bool or int, default False (...) .. deprecated:: 1.0.0 Use rtol and atol instead to define relative, and absolute tolerance, respectively. Similar to :func:numpy.allclose. rtol : float, default 1e-5 Relative tolerance. atol : float, default 1e-8 Absolute tolerance. """ ...

🤔

A few options I thought of:

Any ideas?
Thanks!

@jreback

you can just set the default for check_less_precise = lib.no_default (grep for it)

then if check_less_precise is passed at all you warn (and then set rtol if needed)

change all refs in the pandas codebase and should be good

@joaoleveiga

After looking better at how numpy handles these sort of assertions with numpy.isclose, I realized that perhaps it would be cleaner to just use it, or call math.isclose instead.

WDYT? Does it make sense to replicate math.isclose in _libs.testing.pyx?

This would imply deprecating the _libs.testing.decimal_almost_equal function.

I'm going to commit the changes I made anyway, so that you can review it.

EDIT: BTW, looking at numpy/numpy#10161 has convinced me that going with the math.isclose algorithm seems like the best option. I'm just unsure on how to deal with the deprecation, to avoid being a nuisance to people that might be using the testing.assert_*_equal functions.

@jreback

After looking better at how numpy handles these sort of assertions with numpy.isclose, I realized that perhaps it would be cleaner to just use it, or call math.isclose instead.

WDYT? Does it make sense to replicate math.isclose in _libs.testing.pyx?

This would imply deprecating the _libs.testing.decimal_almost_equal function.

I'm going to commit the changes I made anyway, so that you can review it.

EDIT: BTW, looking at numpy/numpy#10161 has convinced me that going with the math.isclose algorithm seems like the best option. I'm just unsure on how to deal with the deprecation, to avoid being a nuisance to people that might be using the testing.assert_*_equal functions.

yes not averse to using math.isclose here. I think we handle all of the NaN / different dtype stuff before then (that is of course why this was written original; numpy didn't handle anything except float/int properly), but that's where these tol's matter anyhow

@joaoleveiga

Should I move the changes in the whatsnew file from the Enhancements section to the Deprecation section?

@joaoleveiga

Meanwhile, I found some tests that use assert_almost_equal are breaking (and were already breaking prior to this change...). Should I try fixing them in this issue, or open a new one?

@joaoleveiga

WillAyd

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaoleveiga can you fix up merge conflicts? Someone can take a look after that

@joaoleveiga

@joaoleveiga can you fix up merge conflicts? Someone can take a look after that

Conflicts solved.

@joaoleveiga

Looking at the checks that failed, I was left with a few questions:

  1. Should I add a new test under tests.util.test_assert_almost_equal to test that the FutureWarning is raised?
  2. Should the warning issue a FutureWarning or a DeprecationWarning?
  3. Should I add the @pytest.mark.filterwarnings("ignore:The 'check_less_precise':FutureWarning") decoration to all tests that make use of check_less_precise=? 😬

Thanks

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

FutureWarning as you have

NO, just change all of the usages that need changing (likely not very many)

@@ -1217,13 +1347,15 @@ def assert_frame_equal(
check_index_type="equiv",
check_column_type="equiv",
check_frame_type=True,
check_less_precise=False,
check_less_precise=no_default,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use None here, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was just following your suggestion:

you can just set the default for check_less_precise = lib.no_default (grep for it)

Which seemed a better fit when deprecating a parameter. Should I change it to None or keep with the no_default? 😄

jbrockmendel

@jbrockmendel

Do we have internal use cases for this, or are we just/mostly providing it for downstream authors/users?

@WillAyd

@joaoleveiga looks like a few merge conflicts have built up - can you resolve them?

@joaoleveiga

@joaoleveiga looks like a few merge conflicts have built up - can you resolve them?

Done! Sorry for the delay, but these past weeks were a bit intense here (Portugal).

jreback

@jreback

this was pretty close last time i looked. can you merge master

@joaoleveiga

this was pretty close last time i looked. can you merge master

@jreback , just merged master. Locally I'm getting some errors on the datetime tests which seem to be somewhat related to locale definitions. Other than that, other tests passed 👌

@TomAugspurger

@joaoleveiga

@jreback

@joaoleveiga possible typing issue: https://github.com/pandas-dev/pandas/pull/30562/checks?check_run_id=772917451#step:11:10, can you check?

Performing static analysis using mypy
pandas/_testing.py:1183: error: "ExtensionArray" has no attribute "asi8"

I have looked at that and I still don't understand why it's triggering. I didn't change that piece of code.
I can add an additional hasattr(right, "asi8") if it's OK for you, even though it's not really related to the original issue/PR.

WDYT? @TomAugspurger , @jreback

if hasattr(left, "asi8") and hasattr(right, "asi8") and type(right) == type(left):

can you repro locally? I think something like
assert isinstance(left, DatetimeLikeArrayMixin) (and same for right) will work here

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green @jreback. I think the docs could use some expanding, but that can be done as a followup.

Use `rtol` and `atol` instead to define relative/absolute
tolerance, respectively. Similar to :func:`math.isclose`.
rtol : float, default 1e-5
Relative tolerance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user-facing documentation, so these descriptions could be improved.

Could you copy the description from numpy.isclose, or link to it with a See Also?

WillAyd

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor doc comment & I think @TomAugspurger has some, but as a followon is ok. Thanks for doing this PR, very nice!

@@ -781,6 +781,9 @@ Deprecations
- The ``squeeze`` keyword in the ``groupby`` function is deprecated and will be removed in a future version (:issue:`32380`)
- The ``tz`` keyword in :meth:`Period.to_timestamp` is deprecated and will be removed in a future version; use `per.to_timestamp(...).tz_localize(tz)`` instead (:issue:`34522`)
- :meth:`DatetimeIndex.to_perioddelta` is deprecated and will be removed in a future version. Use ``index - index.to_period(freq).to_timestamp()`` instead (:issue:`34853`)
- :meth:`util.testing.assert_almost_equal` now accepts both relative and absolute
precision through the ``rtol``, and ``atol`` parameters, thus deprecating the
``check_less_precise`` parameter. (:issue:`13357`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also indicate #9457 here (for a followon PR is fine)

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request

Jun 27, 2020

@joaoleveiga @fangchenli

@jbrockmendel

@joaoleveiga

@joaoleveiga want to make a PR to enforce this deprecation?

@jbrockmendel , I only had time to tackle this today but I see it was already taken care of :) thanks for the heads up. Cheers

Labels

Testing

pandas testing functions or related to the test suite