(original) (raw)

On Oct 12, 2016, at 7:05 AM, Hal Finkel via llvm-dev <llvm-dev@lists.llvm.org> wrote:

----- Original Message -----
From: "Renato Golin" <renato.golin@linaro.org>
To: "Sebastian Pop" <sebpop.llvm@gmail.com>
Cc: "Hal Finkel" <hfinkel@anl.gov>, "Sebastian Paul Pop" <s.pop@samsung.com>, "llvm-dev" <llvm-dev@lists.llvm.org>,
"Matthias Braun" <matze@braunis.de>, "Clang Dev" <cfe-dev@lists.llvm.org>, "nd" <nd@arm.com>, "Abe Skolnik"
<a.skolnik@samsung.com>
Sent: Wednesday, October 12, 2016 8:35:16 AM
Subject: Re: \[test-suite\] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"

On 12 October 2016 at 14:26, Sebastian Pop <sebpop.llvm@gmail.com>
wrote:
Correct me if I misunderstood: you would be ok changing the
reference output to exactly match the output of "-O0
-ffp-contract=off".

No, that's not at all what I said.

Matching identical outputs to FP tests makes no sense because there's
\*always\* an error bar.

This is something we need to understand. No, there's not always an error bar. With FMA formation and without non-IEEE-compliant optimizations (i.e. fast-math), the optimized answer should be identical to the non-optimized answer.

Can you clarify: in my mind the F in FMA is for “fused”, i.e. no intermediate truncation, i.e. not the same numerical result. But you imply the opposite above?

—

Mehdi

If these don't match, then we should understand why. This used to be a large problem because of fp80-related issues on x86 processors, but even on x86 if we stick to SSE (etc.) FP instructions, this is not an issue any more. We still do see cross-system discrepancies sometimes because of differences in denormal handling, but on the same system that should be consistent (aside, perhaps, from compiler-level constant-folding issues).

-Hal

The output of O0, O1, O2, O3, Ofast, Os, Oz should all be within the
boundaries of an average and its associated error bar.

By understanding what's the \*expected\* output and its associated
error
range we can accurately predict what will be the correct
reference\_output and the tolerance for each individual test.

Your solution 2 "works" because you're doing the matching yourself,
in
the code, and for that, you pay the penalty of running it twice. But
it's not easy to control the tolerance, nor it's stable for all
platforms where we don't yet run the test suite.

My original proposal, and what I'm still proposing here, is to
understand the tests and make them right, by giving them proper
references and tolerances. If the output is too large, reduce/sample
in a way that doesn't increase the error ranges too much, enough to
keep the tolerance low, so we can still catch bugs in the FP
transformations.

cheers,
--renato

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev