[Python-ideas] PEP 485: A Function for testing approximate equality (original) (raw)

Steven D'Aprano steve at pearwood.info
Fri Feb 6 05:48:19 CET 2015


On Thu, Feb 05, 2015 at 05:12:32PM -0800, Chris Barker wrote:

On Thu, Feb 5, 2015 at 4:44 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> > 0.0 < reltol < 1.0)_ _> > I can just about see the point in restricting reltol to the closed > interval 0...1, but not the open interval. Mathematically, setting the > tolerance to 0 should just degrade gracefully to exact equality, sure -- no harm done there. > and a tolerance of 1 is nothing special at all. well, I ended up putting that in because it turns out with the "weak" test, then anything compares as "close" to zero:

Okay. Maybe that means that the "weak test" is not useful if one of the numbers is zero. Neither is a relative tolerance, or an ULP calculation.

tol>=1.0 a = anything b = 0.0 min( abs(a), abs(b) ) = 0.0 abs(a-b) = a

That is incorrect. abs(a-b) for b == 0 is abs(a), not a.

tol * a >= a abs(a-b) <= tol * a

If and only if a >= 0. Makes perfect mathematical sense, even if it's not useful. That's an argument for doing what Bruce Dawson says, and comparing against the maximum of a and b, not the minimum.

Granted, that's actually the best argument yet for using the strong test -- which I am suggesting, though I haven't thought out what that will do in the case of large tolerances.

It should work exactly the same as for small tolerances, except larger wink

> Values larger than 1 aren't often useful, but there really is no reason > to exclude tolerances larger than 1. "Give or take 300%" (ie. > reltol=3.0) is a pretty big tolerance, but it is well-defined: a > difference of 299% is "close enough", 301% is "too far". >

yes it is, but then the whole weak vs string vs asymmetric test becomes important.

Um, yes? I know Guido keeps saying that the difference is unimportant, but I think he is wrong: at the edges, the way you determine "close to" makes a difference whether a and b are considered close or not. If you care enough to specify a specific tolerance (say, 2.3e-4), as opposed to plucking a round number out of thin air, then you care about the edge cases. I'm not entirely sure what to do about it, but my sense is that we should do something.

From my math the "delta" between the weak and strong tests goes with tolerance**2 * max(a,b). So if the tolerance is >=1, then it makes a big difference which test you choose. IN fact:

Is a within 300% of b makes sense, but "are a and b within 300% of each-other" is poorly defined.

No more so that "a and b within 1% of each other". It's just a short-hand. What I mean by "of each other" is the method recommended by Bruce Dawson, use the larger of a and b, what Boost(?) and you are calling the "strong test".

[...]

> Negative error tolerances, on the other hand, do seem to be meaningless > and should be prevented.

you could just take the abs(reltol), but really? what's the point?

No no, I agree with you that tolerances (relative or absolute) should prohibit negative values. Or complex ones for that matter.

> (E.g. "guess the number of grains of sand on this beach".) Any upper > limit you put in is completely arbitrary,

somehow one doesn't feel arbitrary to me -- numbers aren't close if the difference between them is larger than the largest of the numbers -- not arbitrary, maybe unneccesary , but not arbirtrary

Consider one way of detecting outliers in numeric data: any number more than X standard deviations from the mean in either direction may be an outlier.

py> import statistics py> data = [1, 2, 100, 100, 100, 101, 102, 103, 104, 500, 100000] py> m = statistics.mean(data) py> tol = 3*statistics.stdev(data) py> [x for x in data if abs(x-m) > tol] [100000] py> m, tol, tol/m (9201.181818181818, 90344.55455462009, 9.818798969508077)

tol/m is, of course, the error tolerance relative to m, which for the sake of the argument we are treating as the "known value": anything more than 9.818... times the mean is probably an outlier.

Now, the above uses an absolute tolerance, but I should be able to get the same results from a relative tolerance of 9.818... depending on which is more convenient to work with at the time.

-- Steve



More information about the Python-ideas mailing list