[Python-ideas] PEP 485: A Function for testing approximate equality (original) (raw)

Neil Girdhar mistersheik at gmail.com
Sun Feb 15 23:07:57 CET 2015


I'm +1 on the idea, +1 on the name, but -1 on symmetry.

I'm glad that you included "other approaches" (what people actually do in the field). However, your argument that "is it that hard to just type it in" — the whole point of this method is to give a self-commenting line that describes what the reader means. Doesn't it make sense to say: is my value close to some true value? If you wanted a symmetric test in english, you might say "are these values close to each other"?

Best,

Neil

On Thursday, February 5, 2015 at 12:14:25 PM UTC-5, Chris Barker - NOAA Federal wrote:

Hi folks, Time for Take 2 (or is that take 200?) No, I haven't dropped this ;-) After the lengthy thread, I went through and tried to take into account all the thoughts and comments. The result is a PEP with a bunch more explanation, and some slightly different decisions. TL;DR: Please take a look and express your approval or disapproval (+1, +0, -0, -1, or whatever). Please keep in mind that this has been discussed a lot, so try to not to re-iterate what's already been said. And I'd really like it if you only gave -1, thumbs down, etc, if you really think this is worse than not having anything at all, rather than you'd just prefer something a little different. Also, if you do think this version is worse than nothing, but a different choice or two could change that, then let us know what -- maybe we can reach consensus on something slightly different. Also keep in mid that the goal here (in the words of Nick Coghlan): """ the key requirement here should be "provide a binary float comparison function that is significantly less wrong than the current 'a == b'" """ No need to split hairs. https://www.python.org/dev/peps/pep-0485/ Full PEP below, and in gitHub (with example code, tests, etc.) here: https://github.com/PythonCHB/closepep Full Detail: ========= Here are the big changes: * Symmetric test. Essentially, whether you want the symmetric or asymmetric test depends on exactly what question you are asking, but I think we need to pick one. It makes little to no difference for most use cases (see below) , but I was persuaded by: - Principle of least surprise -- equality is symmetric, shouldn't "closeness" be too? - People don't want to have to remember what order to put the arguments --even if it doesn't matter, you have to think about whether it matters. A symmetric test doesn't require that. - There are times when the order of comparison is not known -- for example if the users wants to test a bunch of values all against each-other. On the other hand -- the asymmetric test is a better option only when you are specifically asking the question: is this (computed) value within a precise tolerance of another(known) value, and that tolerance is fairly large (i.e 1% - 10%). While this was brought up on this thread, no one had an actual use-case like that. And is it really that hard to write: known - computed <= 0.1* expected

NOTE: that my example code has a flag with which you can pick which test to use -- I did that for experimentation, but I think we want a simple API here. As soon as you provide that option people need to concern themselves with what to use. * Defaults: I set the default relative tolerance to 1e-9 -- because that is a little more than half the precision available in a Python float, and it's the largest tolerance for which ALL the possible tests result in exactly the same result for all possible float values (see the PEP for the math). The default for absolute tolerance is 0.0. Better for people to get a failed test with a check for zero right away than accidentally use a totally inappropriate value. Contentious Issues =============== * The Symmetric vs. Asymmetric thing -- some folks seemed to have string ideas about that, but i hope we can all agree that something is better than nothing, and symmetric is probably the least surprising. And take a look at the math in teh PEP -- for small tolerance, which is the likely case for most tests -- it literally makes no difference at all which test is used. * Default for abstol -- or even whether to have it or not. This is a tough one -- there are a lot of use-cases for people testing against zero, so it's a pity not to have it do something reasonable by default in that case. However, there really is no reasonable universal default. As Guido said: """ For someone who for whatever reason is manipulating quantities that are in the range of 1e-100, 1e-12 is about as large as infinity. """ And testing against zero really requires a absolute tolerance test, which is not hard to simply write: abs(myval) <= sometolerance_ _and is covered by the unittest assert already._ _So why include an absolutetolerance at all? -- Because this is likely to_ _be used inside a comprehension or in the a unittest sequence comparison, so_ _it is very helpful to be able to test a bunch of values some of which may_ _be zero, all with a single function with a single set of parameters. And a_ _similar approach has proven to be useful for numpy and the statistics test_ _module. And again, the default is abstol=0.0 -- if the user sticks with_ _defaults, they won't get bitten._ _* Also -- not really contentious (I hope) but I left the details out about_ _how the unittest assertion would work. That's because I don't really use_ _unittest -- I'm hoping someone else will write that. If it comes down to_ _it, I can do it -- it will probably look a lot like the existing sequence_ _comparing assertions._ _OK -- I think that's it._ _Let's see if we can drive this home._ _-Chris_ _PEP: 485_ _Title: A Function for testing approximate equality_ _Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Christopher Barker <Chris.... at noaa.gov javascript:> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Jan-2015 Python-Version: 3.5 Post-History: Abstract ======== This PEP proposes the addition of a function to the standard library that determines whether one value is approximately equal or "close" to another value. It is also proposed that an assertion be added to the unittest.TestCase class to provide easy access for those using unittest for testing. Rationale ========= Floating point values contain limited precision, which results in their being unable to exactly represent some values, and for error to accumulate with repeated computation. As a result, it is common advice to only use an equality comparison in very specific situations. Often a inequality comparison fits the bill, but there are times (often in testing) where the programmer wants to determine whether a computed value is "close" to an expected value, without requiring them to be exactly equal. This is common enough, particularly in testing, and not always obvious how to do it, so it would be useful addition to the standard library. Existing Implementations ------------------------ The standard library includes the unittest.TestCase.assertAlmostEqual method, but it: * Is buried in the unittest.TestCase class * Is an assertion, so you can't use it as a general test at the command line, etc. (easily) * Is an absolute difference test. Often the measure of difference requires, particularly for floating point numbers, a relative error, i.e "Are these two values within x% of each-other?", rather than an absolute error. Particularly when the magnatude of the values is unknown a priori. The numpy package has the allclose() and isclose() functions, but they are only available with numpy. The statistics package tests include an implementation, used for its unit tests. One can also find discussion and sample implementations on Stack Overflow and other help sites. Many other non-python systems provide such a test, including the Boost C++ library and the APL language (reference?). These existing implementations indicate that this is a common need and not trivial to write oneself, making it a candidate for the standard library. Proposed Implementation ======================= NOTE: this PEP is the result of an extended discussion on the python-ideas list [1]. The new function will have the following signature:: isclose(a, b, reltolerance=1e-9, abstolerance=0.0) a and b: are the two values to be tested to relative closeness reltolerance: is the relative tolerance -- it is the amount of error allowed, relative to the magnitude a and b. For example, to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-8, which assures that the two values are the same within about 8 decimal digits. abstolerance: is an minimum absolute tolerance level -- useful for comparisons near zero. Modulo error checking, etc, the function will return the result of:: abs(a-b) <= max( reltolerance * min(abs(a), abs(b), abstolerance )_ _Handling of non-finite numbers_ _------------------------------_ _The IEEE 754 special values of NaN, inf, and -inf will be handled_ _according to IEEE rules. Specifically, NaN is not considered close to_ _any other value, including NaN. inf and -inf are only considered close_ _to themselves._ _Non-float types_ _---------------_ _The primary use-case is expected to be floating point numbers._ _However, users may want to compare other numeric types similarly. In_ _theory, it should work for any type that supports abs(),_ _comparisons, and subtraction. The code will be written and tested to_ _accommodate these types:_ _* Decimal: for Decimal, the tolerance must be set to a Decimal type._ _* int_ _* Fraction_ _* complex: for complex, abs(z) will be used for scaling and_ _comparison._ _Behavior near zero_ _------------------_ _Relative comparison is problematic if either value is zero. By_ _definition, no value is small relative to zero. And computationally,_ _if either value is zero, the difference is the absolute value of the_ _other value, and the computed absolute tolerance will be reltolerance_ _times that value. rel-tolerance is always less than one, so the_ _difference will never be less than the tolerance._ _However, while mathematically correct, there are many use cases where_ _a user will need to know if a computed value is "close" to zero. This_ _calls for an absolute tolerance test. If the user needs to call this_ _function inside a loop or comprehension, where some, but not all, of_ _the expected values may be zero, it is important that both a relative_ _tolerance and absolute tolerance can be tested for with a single_ _function with a single set of parameters._ _There is a similar issue if the two values to be compared straddle zero:_ _if a is approximately equal to -b, then a and b will never be computed_ _as "close"._ _To handle this case, an optional parameter, abstolerance can be_ _used to set a minimum tolerance used in the case of very small or zero_ _computed absolute tolerance. That is, the values will be always be_ _considered close if the difference between them is less than the_ _abstolerance_ _The default absolute tolerance value is set to zero because there is_ _no value that is appropriate for the general case. It is impossible to_ _know an appropriate value without knowing the likely values expected_ _for a given use case. If all the values tested are on order of one,_ _then a value of about 1e-8 might be appropriate, but that would be far_ _too large if expected values are on order of 1e-12 or smaller._ _Any non-zero default might result in user's tests passing totally_ _inappropriately. If, on the other hand a test against zero fails the_ _first time with defaults, a user will be prompted to select an_ _appropriate value for the problem at hand in order to get the test to_ _pass._ _NOTE: that the author of this PEP has resolved to go back over many of_ _his tests that use the numpy allclose() function, which provides_ _a default abstolerance, and make sure that the default value is_ _appropriate._ _If the user sets the reltolerance parameter to 0.0, then only the_ _absolute tolerance will effect the result. While not the goal of the_ _function, it does allow it to be used as a purely absolute tolerance_ _check as well._ _unittest assertion_ _-------------------_ _[need text here]_ _implementation_ _--------------_ _A sample implementation is available (as of Jan 22, 2015) on gitHub:_ _https://github.com/PythonCHB/closepep/blob/master_ _This implementation has a flag that lets the user select which_ _relative tolerance test to apply -- this PEP does not suggest that_ _that be retained, but rather than the strong test be selected._ _Relative Difference_ _===================_ _There are essentially two ways to think about how close two numbers_ _are to each-other:_ _Absolute difference: simply abs(a-b)_ _Relative difference: abs(a-b)/scalefactor [2]._ _The absolute difference is trivial enough that this proposal focuses_ _on the relative difference._ _Usually, the scale factor is some function of the values under_ _consideration, for instance:_ _1) The absolute value of one of the input values_ _2) The maximum absolute value of the two_ _3) The minimum absolute value of the two._ _4) The absolute value of the arithmetic mean of the two_ _These lead to the following possibilities for determining if two_ _values, a and b, are close to each other._ _1) abs(a-b) <= tol*abs(a)_ _2) abs(a-b) <= tol * max( abs(a), abs(b) )_ _3) abs(a-b) <= tol * min( abs(a), abs(b) )_ _4) abs(a-b) <= tol * (a + b)/2_ _NOTE: (2) and (3) can also be written as:_ _2) (abs(a-b) <= tol*abs(a)) or (abs(a-b) <= tol*abs(a))_ _3) (abs(a-b) <= tol*abs(a)) and (abs(a-b) <= tol*abs(a))_ _(Boost refers to these as the "weak" and "strong" formulations [3])_ _These can be a tiny bit more computationally efficient, and thus are_ _used in the example code._ _Each of these formulations can lead to slightly different results._ _However, if the tolerance value is small, the differences are quite_ _small. In fact, often less than available floating point precision._ _How much difference does it make?_ _---------------------------------_ _When selecting a method to determine closeness, one might want to know_ _how much of a difference it could make to use one test or the other_ _-- i.e. how many values are there (or what range of values) that will_ _pass one test, but not the other._ _The largest difference is between options (2) and (3) where the_ _allowable absolute difference is scaled by either the larger or_ _smaller of the values._ _Define delta to be the difference between the allowable absolute_ _tolerance defined by the larger value and that defined by the smaller_ _value. That is, the amount that the two input values need to be_ _different in order to get a different result from the two tests._ _tol is the relative tolerance value._ _Assume that a is the larger value and that both a and b_ _are positive, to make the analysis a bit easier. delta is_ _therefore::_ _delta = tol * (a-b)_ _or::_ _delta / tol = (a-b)_ _The largest absolute difference that would pass the test: (a-b),_ _equals the tolerance times the larger value::_ _(a-b) = tol * a_ _Substituting into the expression for delta::_ _delta / tol = tol * a_ _so::_ _delta = tol**2 * a_ _For example, for a = 10, b = 9, tol = 0.1 (10%):_ _maximum tolerance tol * a == 0.1 * 10 == 1.0_ _minimum tolerance tol * b == 0.1 * 9.0 == 0.9_ _delta = (1.0 - 0.9) * 0.1 = 0.1 or tol**2 * a = 0.1**2 * 10 = .01_ _The absolute difference between the maximum and minimum tolerance_ _tests in this case could be substantial. However, the primary use_ _case for the proposed function is testing the results of computations._ _In that case a relative tolerance is likely to be selected of much_ _smaller magnitude._ _For example, a relative tolerance of 1e-8 is about half the_ _precision available in a python float. In that case, the difference_ _between the two tests is 1e-8**2 * a or 1e-16 * a, which is_ _close to the limit of precision of a python float. If the relative_ _tolerance is set to the proposed default of 1e-9 (or smaller), the_ _difference between the two tests will be lost to the limits of_ _precision of floating point. That is, each of the four methods will_ _yield exactly the same results for all values of a and b._ _In addition, in common use, tolerances are defined to 1 significant_ _figure -- that is, 1e-8 is specifying about 8 decimal digits of_ _accuracy. So the difference between the various possible tests is well_ _below the precision to which the tolerance is specified._ _Symmetry_ _--------_ _A relative comparison can be either symmetric or non-symmetric. For a_ _symmetric algorithm:_ _iscloseto(a,b) is always the same as iscloseto(b,a)_ _If a relative closeness test uses only one of the values (such as (1)_ _above), then the result is asymmetric, i.e. iscloseto(a,b) is not_ _necessarily the same as iscloseto(b,a)._ _Which approach is most appropriate depends on what question is being_ _asked. If the question is: "are these two numbers close to each_ _other?", there is no obvious ordering, and a symmetric test is most_ _appropriate._ _However, if the question is: "Is the computed value within x% of this_ _known value?", then it is appropriate to scale the tolerance to the_ _known value, and an asymmetric test is most appropriate._ _From the previous section, it is clear that either approach would_ _yield the same or similar results in the common use cases. In that_ _case, the goal of this proposal is to provide a function that is least_ _likely to produce surprising results._ _The symmetric approach provide an appealing consistency -- it_ _mirrors the symmetry of equality, and is less likely to confuse_ _people. A symmetric test also relieves the user of the need to think_ _about the order in which to set the arguments. It was also pointed_ _out that there may be some cases where the order of evaluation may not_ _be well defined, for instance in the case of comparing a set of values_ _all against each other._ _There may be cases when a user does need to know that a value is_ _within a particular range of a known value. In that case, it is easy_ _enough to simply write the test directly::_ _if a-b <= tol*a:_ _(assuming a > b in this case). There is little need to provide a function for this particular case. This proposal uses a symmetric test. Which symmetric test? --------------------- There are three symmetric tests considered: The case that uses the arithmetic mean of the two values requires that the value be either added together before dividing by 2, which could result in extra overflow to inf for very large numbers, or require each value to be divided by two before being added together, which could result in underflow to -inf for very small numbers. This effect would only occur at the very limit of float values, but it was decided there as no benefit to the method worth reducing the range of functionality. This leaves the boost "weak" test (2)-- or using the smaller value to scale the tolerance, or the Boost "strong" (3) test, which uses the smaller of the values to scale the tolerance. For small tolerance, they yield the same result, but this proposal uses the boost "strong" test case: it is symmetric and provides a slightly stricter criteria for tolerance. Defaults ======== Default values are required for the relative and absolute tolerance. Relative Tolerance Default -------------------------- The relative tolerance required for two values to be considered "close" is entirely use-case dependent. Nevertheless, the relative tolerance needs to be less than 1.0, and greater than 1e-16 (approximate precision of a python float). The value of 1e-9 was selected because it is the largest relative tolerance for which the various possible methods will yield the same result, and it is also about half of the precision available to a python float. In the general case, a good numerical algorithm is not expected to lose more than about half of available digits of accuracy, and if a much larger tolerance is acceptable, the user should be considering the proper value in that case. Thus 1-e9 is expected to "just work" for many cases. Absolute tolerance default -------------------------- The absolute tolerance value will be used primarily for comparing to zero. The absolute tolerance required to determine if a value is "close" to zero is entirely use-case dependent. There is also essentially no bounds to the useful range -- expected values would conceivably be anywhere within the limits of a python float. Thus a default of 0.0 is selected. If, for a given use case, a user needs to compare to zero, the test will be guaranteed to fail the first time, and the user can select an appropriate value. It was suggested that comparing to zero is, in fact, a common use case (evidence suggest that the numpy functions are often used with zero). In this case, it would be desirable to have a "useful" default. Values around 1-e8 were suggested, being about half of floating point precision for values of around value 1. However, to quote The Zen: "In the face of ambiguity, refuse the temptation to guess." Guessing that users will most often be concerned with values close to 1.0 would lead to spurious passing tests when used with smaller values -- this is potentially more damaging than requiring the user to thoughtfully select an appropriate value. Expected Uses ============= The primary expected use case is various forms of testing -- "are the results computed near what I expect as a result?" This sort of test may or may not be part of a formal unit testing suite. Such testing could be used one-off at the command line, in an iPython notebook, part of doctests, or simple assets in an if _name_ == "_main_" block. The proposed unitest.TestCase assertion would have course be used in unit testing. It would also be an appropriate function to use for the termination criteria for a simple iterative solution to an implicit function:: guess = something while True: newguess = implicitfunction(guess, *args) if isclose(newguess, guess): break guess = newguess Inappropriate uses ------------------ One use case for floating point comparison is testing the accuracy of a numerical algorithm. However, in this case, the numerical analyst ideally would be doing careful error propagation analysis, and should understand exactly what to test for. It is also likely that ULP (Unit in the Last Place) comparison may be called for. While this function may prove useful in such situations, It is not intended to be used in that way. Other Approaches ================ unittest.TestCase.assertAlmostEqual --------------------------------------- ( https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual ) Tests that values are approximately (or not approximately) equal by computing the difference, rounding to the given number of decimal places (default 7), and comparing to zero. This method is purely an absolute tolerance test, and does not address the need for a relative tolerance test. numpy isclose() -------------------- http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html The numpy package provides the vectorized functions isclose() and allclose, for similar use cases as this proposal: isclose(a, b, rtol=1e-05, atol=1e-08, equalnan=False) Returns a boolean array where two arrays are element-wise equal within a tolerance. The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b In this approach, the absolute and relative tolerance are added together, rather than the or method used in this proposal. This is computationally more simple, and if relative tolerance is larger than the absolute tolerance, then the addition will have no effect. However, if the absolute and relative tolerances are of similar magnitude, then the allowed difference will be about twice as large as expected. This makes the function harder to understand, with no computational advantage in this context. Even more critically, if the values passed in are small compared to the absolute tolerance, then the relative tolerance will be completely swamped, perhaps unexpectedly. This is why, in this proposal, the absolute tolerance defaults to zero -- the user will be required to choose a value appropriate for the values at hand. Boost floating-point comparison ------------------------------- The Boost project ( [3] ) provides a floating point comparison function. Is is a symmetric approach, with both "weak" (larger of the two relative errors) and "strong" (smaller of the two relative errors) options. This proposal uses the Boost "strong" approach. There is no need to complicate the API by providing the option to select different methods when the results will be similar in most cases, and the user is unlikely to know which to select in any case. Alternate Proposals ------------------- A Recipe ''''''''' The primary alternate proposal was to not provide a standard library function at all, but rather, provide a recipe for users to refer to. This would have the advantage that the recipe could provide and explain the various options, and let the user select that which is most appropriate. However, that would require anyone needing such a test to, at the very least, copy the function into their code base, and select the comparison method to use. In addition, adding the function to the standard library allows it to be used in the unittest.TestCase.assertIsClose() method, providing a substantial convenience to those using unittest. zerotol '''''''''''' One possibility was to provide a zero tolerance parameter, rather than the absolute tolerance parameter. This would be an absolute tolerance that would only be applied in the case of one of the arguments being exactly zero. This would have the advantage of retaining the full relative tolerance behavior for all non-zero values, while allowing tests against zero to work. However, it would also result in the potentially surprising result that a small value could be "close" to zero, but not "close" to an even smaller value. e.g., 1e-10 is "close" to zero, but not "close" to 1e-11. No absolute tolerance ''''''''''''''''''''' Given the issues with comparing to zero, another possibility would have been to only provide a relative tolerance, and let every comparison to zero fail. In this case, the user would need to do a simple absolute test: abs(val) < zerotol in the case where the comparison involved zero. However, this would not allow the same call to be used for a sequence of values, such as in a loop or comprehension, or in the TestCase.assertClose() method. Making the function far less useful. It is noted that the default abstolerance=0.0 achieves the same effect if the default is not overidden. Other tests '''''''''''' The other tests considered are all discussed in the Relative Error section above. References ========== .. [1] Python-ideas list discussion threads https://mail.python.org/pipermail/python-ideas/2015-January/030947.html https://mail.python.org/pipermail/python-ideas/2015-January/031124.html https://mail.python.org/pipermail/python-ideas/2015-January/031313.html .. [2] Wikipedia page on relative difference http://en.wikipedia.org/wiki/Relativechangeanddifference .. [3] Boost project floating-point comparison algorithms http://www.boost.org/doc/libs/1350/libs/test/doc/components/testtools/floatingpointcomparison.html .. Bruce Dawson's discussion of floating point. https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.... at noaa.gov javascript: -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150215/3d73cca2/attachment-0001.html>



More information about the Python-ideas mailing list