[analyzer] Add option assume-at-least-one-iteration by NagyDonat · Pull Request #125494 · llvm/llvm-project (original) (raw)

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)

Changes

This commit adds the new analyzer option assume-one-iteration, which is false by default, but can be set to true to ensure that the analyzer always assumes (at least) one iteration in loops.

In some situations this "loop is skipped" execution path is an important corner case that may evade the notice of the developer and hide significant bugs -- however, there are also many situations where it's guaranteed that at least one iteration will happen (e.g. some data structure is always nonempty), but the analyzer cannot realize this and will produce false positives when it assumes that the loop is skipped.

This commit refactors some logic around the implementation of the new feature, but those changes are clearly NFC when the new analyzer option is in its default state.

This commit is the new step of my loop handling improvement plans. When I wrote this, I hoped (based on eyeballing older analysis run results) that this will be an important "must have" improvement of the analyzer results; however it turns out that the effects are more ambiguous.

Based on this, I no longer think that this option should be enabled by default, but I still think that this option could be useful for some users to reduce the amount of unwanted results.

I analyzed the usual set of open source projects to see the effects of enabling this option, and I started to evaluate the results, but I didn't finish this evaluation because I felt that even this partial evaluation is enough to justify merging this as an off-by-default feature.

Project	Assuming one iteration	Baseline	Notes
memcached	0 new reports	1 resolved reports	assuming empty list -> calloc with zero size report
tmux	0 new reports	7 resolved reports	all 7 are core.NonNullParamChecker reports that assume emptiness of custom data structures, e.g. this
curl	0 new reports	0 resolved reports
twin	0 new reports	3 resolved reports	one corner case that seems to be very unlikely but might be a true positive, one false positive assumes an empty string, one unrelated false positive accidentally discarded
vim	5 new reports	18 resolved reports
openssl	0 new reports	5 resolved reports
sqlite	12 new reports	25 resolved reports
ffmpeg	24 new reports	175 resolved reports
postgres	2 new reports	49 resolved reports
tinyxml2	0 new reports	0 resolved reports
libwebm	3 new reports	0 resolved reports
xerces	1 new reports	3 resolved reports
bitcoin	0 new reports	9 resolved reports
protobuf	0 new reports	4 resolved reports
qtbase	16 new reports	49 resolved reports

Full diff: https://github.com/llvm/llvm-project/pull/125494.diff

4 Files Affected:

(modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+10)
(modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+30-12)
(modified) clang/test/Analysis/analyzer-config.c (+1)
(modified) clang/test/Analysis/loop-assumptions.c (+65-23)

diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index 34bb7a809162ba..f98500f5fcf1dc 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -294,6 +294,16 @@ ANALYZER_OPTION( bool, ShouldUnrollLoops, "unroll-loops", "Whether the analysis should try to unroll loops with known bounds.", false)

+ANALYZER_OPTION(

bool, ShouldAssumeOneIteration, "assume-one-iteration",
"Whether the analyzer should always assume at least one iteration in "
"loops where the loop condition is opaque (i.e. the analyzer cannot "
"determine if it's true or false). Setting this to true eliminates some "
"false positives (where e.g. a structure is nonempty, but the analyzer "
"does not notice this); but it also eliminates some true positives (e.g. "
"cases where a structure can be empty and this causes buggy behavior).",
false)
ANALYZER_OPTION( bool, ShouldDisplayNotesAsEvents, "notes-as-events", "Whether the bug reporter should transparently treat extra note diagnostic "

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp index 2b1872f8386aad..8c822a0ecc3b6f 100644 --- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp +++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp @@ -2810,13 +2810,17 @@ void ExprEngine::processBranch( if (StTrue && StFalse) assert(!isa(Condition));

bool BothFeasible =
```
   (StTrue && StFalse) ||
```

   didEagerlyAssumeBifurcateAt(PrevState, dyn_cast<Expr>(Condition));

```
if (StTrue) {
```

 // If we are processing a loop condition where two iterations have

 // already been completed and the false branch is also feasible, then

 // don't assume a third iteration because it is a redundant execution

 // path (unlikely to be different from earlier loop exits) and can cause

 // false positives if e.g. the loop iterates over a two-element structure

```
 // with an opaque condition.
```

 // In a loop, if both branches are feasible (i.e. the analyzer doesn't

 // understand the loop condition) and two iterations have already been

 // completed, then don't assume a third iteration because it is a

 // redundant execution path (unlikely to be different from earlier loop

 // exits) and can cause false positives if e.g. the loop iterates over a

 // two-element structure with an opaque condition.
 //
 // The iteration count "2" is hardcoded because it's the natural limit:
 // * the fact that the programmer wrote a loop (and not just an `if`)

@@ -2827,10 +2831,7 @@ void ExprEngine::processBranch( // two iterations". (This pattern is common in FFMPEG and appears in // many other projects as well.) bool CompletedTwoIterations = IterationsCompletedInLoop.value_or(0) >= 2;

```
 bool FalseAlsoFeasible =
```
```
     StFalse ||
```

     didEagerlyAssumeBifurcateAt(PrevState, dyn_cast<Expr>(Condition));

 bool SkipTrueBranch = CompletedTwoIterations && FalseAlsoFeasible;

 bool SkipTrueBranch = BothFeasible && CompletedTwoIterations;

 // FIXME: This "don't assume third iteration" heuristic partially
 // conflicts with the widen-loop analysis option (which is off by

@@ -2840,8 +2841,25 @@ void ExprEngine::processBranch( Builder.generateNode(StTrue, true, PredN); }

if (StFalse)

 Builder.generateNode(StFalse, false, PredN);

if (StFalse) {

 // In a loop, if both branches are feasible (i.e. the analyzer doesn't

 // understand the loop condition), we are before the first iteration and

 // the analyzer option `assume-one-iteration` is set to `true`, then avoid

 // creating the execution path where the analyzer skips the loop.

```
 //
```

 // In some situations this "loop is skipped" execution path is an

 // important corner case that may evade the notice of the developer and

 // hide significant bugs -- however, there are also many situations where

 // it's guaranteed that at least one iteration will happen (e.g. some

 // data structure is always nonempty), but the analyzer cannot realize

 // this and will produce false positives when it assumes that the loop is

```
 // skipped.
```

 bool BeforeFirstIteration = IterationsCompletedInLoop == std::optional{0};

 bool SkipFalseBranch = BothFeasible && BeforeFirstIteration &&

                        AMgr.options.ShouldAssumeOneIteration;

```
 if (!SkipFalseBranch)
```

   Builder.generateNode(StFalse, false, PredN);

}

} currBldrCtx = nullptr; } diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index d5eb790b82f238..3611f6cb40c214 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -11,6 +11,7 @@ // CHECK-NEXT: alpha.osx.cocoa.DirectIvarAssignment:AnnotatedFunctions = false // CHECK-NEXT: apply-fixits = false // CHECK-NEXT: assume-controlled-environment = false +// CHECK-NEXT: assume-one-iteration = false // CHECK-NEXT: avoid-suppressing-null-argument-paths = false // CHECK-NEXT: c++-allocator-inlining = true // CHECK-NEXT: c++-container-inlining = false diff --git a/clang/test/Analysis/loop-assumptions.c b/clang/test/Analysis/loop-assumptions.c index eb0ffdce722e0c..bd8b74b1d531f9 100644 --- a/clang/test/Analysis/loop-assumptions.c +++ b/clang/test/Analysis/loop-assumptions.c @@ -1,25 +1,48 @@ // RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
-// RUN: -verify=expected,eagerlyassume %s +// RUN: -verify=expected,noassumeone,eagerlyassume,combo %s // RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
// RUN: -analyzer-config eagerly-assume=false
+// RUN: -verify=expected,noassumeone,noeagerlyassume,combo %s +// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
+// RUN: -analyzer-config assume-one-iteration=true
+// RUN: -verify=expected,eagerlyassume,combo %s +// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
+// RUN: -analyzer-config assume-one-iteration=true,eagerly-assume=false
// RUN: -verify=expected,noeagerlyassume %s

+// The verify tag "combo" is used for one unique warning which is produced in three +// of the four RUN combinations. + // These tests validate the logic within ExprEngine::processBranch which // ensures that in loops with opaque conditions we don't assume execution paths // if the code does not imply that they are possible. +// In particular, if two (or more) iterations are already completed in a loop, +// we don't assume that there can be another iteration. Moreover, if the +// analyzer option assume-one-iteration is enabled, then we don't assume that +// a loop can be skipped completely.

void clang_analyzer_numTimesReached(void); -void clang_analyzer_warnIfReached(void); void clang_analyzer_dump(int);

-void clearCondition(void) {

// If the analyzer can definitely determine the value of the loop condition, +void clearTrueCondition(void) {

// If the analyzer can definitely determine that the loop condition is true, // then this corrective logic doesn't activate and the engine executes // -analyzer-max-loop iterations (by default, 4).

for (int i = 0; i < 10; i++)

int i;
for (i = 0; i < 10; i++) clang_analyzer_numTimesReached(); // expected-warning {{4}}

clang_analyzer_warnIfReached(); // unreachable

clang_analyzer_dump(i); // Unreachable, no reports. +}
+void clearFalseCondition(void) {
// If the analyzer can definitely determine that the loop condition is false,
// then the loop is (obviously) skipped, even in assume-one-iteration mode.
int i;
for (i = 0; i > 10; i++)
clang_analyzer_numTimesReached(); // Unreachable, no report.
clang_analyzer_dump(i); // expected-warning {{0}} }

void opaqueCondition(int arg) { @@ -28,10 +51,13 @@ void opaqueCondition(int arg) { // that more than two iterations are possible. (It does imply that two // iterations may be possible at least in some cases, because otherwise an // if would've been enough.)

for (int i = 0; i < arg; i++)

// Moreover, if assume-one-iteration is enabled, then assume at least one
// iteration.
int i;
for (i = 0; i < arg; i++) clang_analyzer_numTimesReached(); // expected-warning {{2}}

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // noassumeone-warning {{0}} expected-warning {{1}} expected-warning {{2}} }

int check(void); @@ -42,22 +68,26 @@ void opaqueConditionCall(int arg) { // insert an assertion to guide the analyzer and rule out more than two // iterations (so the analyzer needs to proactively avoid those unjustified // branches).

while (check())

int i = 0; // Helper to distinguish the the branches after the loop.
while (check()) { clang_analyzer_numTimesReached(); // expected-warning {{2}}
i++;
}

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // noassumeone-warning {{0}} expected-warning {{1}} expected-warning {{2}} }

void opaqueConditionDoWhile(int arg) { // Same situation as opaqueCondition() but with a do {} while () loop. // This is tested separately because this loop type is a special case in the // iteration count calculation.

// Obviously, this loop guarantees that at least one iteration will happen. int i = 0; do { clang_analyzer_numTimesReached(); // expected-warning {{2}} } while (i++ < arg);

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // expected-warning {{1}} expected-warning {{2}} }

void dontRememberOldBifurcation(int arg) { @@ -69,7 +99,7 @@ void dontRememberOldBifurcation(int arg) { // by default), because the code remembered that there was a bifurcation on // the first iteration of the loop and didn't realize that this is obsolete.

// NOTE: The variable i is introduced to ensure that the iterations of the

// NOTE: The variable i is significant to ensure that the iterations of the // loop change the state -- otherwise the analyzer stops iterating because it // returns to the same ExplodedNode. int i = 0; @@ -78,10 +108,12 @@ void dontRememberOldBifurcation(int arg) { i++; }

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // noassumeone-warning {{0}} }

void dontAssumeFourthIterartion(int arg) {

int i;
if (arg == 2) return;

@@ -89,10 +121,10 @@ void dontAssumeFourthIterartion(int arg) { // iterations (because it knows that arg != 2 at that point), so it // performs a third iteration, but it does not assume that a fourth iteration // is also possible.

for (int i = 0; i < arg; i++)

for (i = 0; i < arg; i++) clang_analyzer_numTimesReached(); // expected-warning {{3}}

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // noassumeone-warning {{0}} expected-warning {{1}} expected-warning {{3}} }

#define TRUE 1 @@ -108,20 +140,24 @@ void shortCircuitInLoopCondition(int arg) { // false positive on the ffmpeg codebase. Eventually we should properly // recognize the full syntactical loop condition expression as "the loop // condition", but this will be complicated to implement.

for (int i = 0; i < arg && TRUE; i++) {

int i;
for (i = 0; i < arg && TRUE; i++) { clang_analyzer_numTimesReached(); // expected-warning {{4}} }

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // expected-warning {{0}} expected-warning {{1}} expected-warning {{2}} expected-warning {{3}} }

void shortCircuitInLoopConditionRHS(int arg) { // Unlike shortCircuitInLoopCondition(), this case is handled properly // because the analyzer thinks that the right hand side of the && is the // loop condition.

for (int i = 0; TRUE && i < arg; i++) {

int i;
for (i = 0; TRUE && i < arg; i++) { clang_analyzer_numTimesReached(); // expected-warning {{2}} }

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // noassumeone-warning {{0}} expected-warning {{1}} expected-warning {{2}} }

void eagerlyAssumeInSubexpression(int arg) { @@ -134,16 +170,22 @@ void eagerlyAssumeInSubexpression(int arg) { // sub-expression of the loop condition (as in this contrived test case). // FIXME: I don't know a real-world example for this inconsistency, but it // would be good to eliminate it eventually.

for (int i = 0; (i >= arg) - 1; i++) {

int i;
for (i = 0; (i >= arg) - 1; i++) { clang_analyzer_numTimesReached(); // eagerlyassume-warning {{4}} noeagerlyassume-warning {{2}} }

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

// The 'combo' warning intentionally appears when assume-one-iteration is
// disabled, but also appears as a bug (or at least inaccuracy) when
// assume-one-iteration is true but EagerlyAssume is also enabled.
clang_analyzer_dump(i); // combo-warning {{0}} expected-warning {{1}} expected-warning {{2}} eagerlyassume-warning {{3}} }

void calledTwice(int arg, int isFirstCall) { // This function is called twice (with two different unknown 'arg' values) to // check the iteration count handling in this situation.

for (int i = 0; i < arg; i++) {

int i;
for (i = 0; i < arg; i++) { if (isFirstCall) { clang_analyzer_numTimesReached(); // expected-warning {{2}} } else { @@ -215,5 +257,5 @@ void onlyLoopConditions(int arg) { break; }

clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}

clang_analyzer_dump(i); // expected-warning {{1}} expected-warning {{2}} expected-warning {{3}} expected-warning {{4}} }