[analyzer] Add option assume-at-least-one-iteration by NagyDonat · Pull Request #125494 · llvm/llvm-project (original) (raw)

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)

Changes

This commit adds the new analyzer option assume-one-iteration, which is false by default, but can be set to true to ensure that the analyzer always assumes (at least) one iteration in loops.

In some situations this "loop is skipped" execution path is an important corner case that may evade the notice of the developer and hide significant bugs -- however, there are also many situations where it's guaranteed that at least one iteration will happen (e.g. some data structure is always nonempty), but the analyzer cannot realize this and will produce false positives when it assumes that the loop is skipped.

This commit refactors some logic around the implementation of the new feature, but those changes are clearly NFC when the new analyzer option is in its default state.


This commit is the new step of my loop handling improvement plans. When I wrote this, I hoped (based on eyeballing older analysis run results) that this will be an important "must have" improvement of the analyzer results; however it turns out that the effects are more ambiguous.

Based on this, I no longer think that this option should be enabled by default, but I still think that this option could be useful for some users to reduce the amount of unwanted results.

I analyzed the usual set of open source projects to see the effects of enabling this option, and I started to evaluate the results, but I didn't finish this evaluation because I felt that even this partial evaluation is enough to justify merging this as an off-by-default feature.

Project Assuming one iteration Baseline Notes
memcached 0 new reports 1 resolved reports assuming empty list -> calloc with zero size report
tmux 0 new reports 7 resolved reports all 7 are core.NonNullParamChecker reports that assume emptiness of custom data structures, e.g. this
curl 0 new reports 0 resolved reports
twin 0 new reports 3 resolved reports one corner case that seems to be very unlikely but might be a true positive, one false positive assumes an empty string, one unrelated false positive accidentally discarded
vim 5 new reports 18 resolved reports
openssl 0 new reports 5 resolved reports
sqlite 12 new reports 25 resolved reports
ffmpeg 24 new reports 175 resolved reports
postgres 2 new reports 49 resolved reports
tinyxml2 0 new reports 0 resolved reports
libwebm 3 new reports 0 resolved reports
xerces 1 new reports 3 resolved reports
bitcoin 0 new reports 9 resolved reports
protobuf 0 new reports 4 resolved reports
qtbase 16 new reports 49 resolved reports

Full diff: https://github.com/llvm/llvm-project/pull/125494.diff

4 Files Affected:

diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index 34bb7a809162ba..f98500f5fcf1dc 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -294,6 +294,16 @@ ANALYZER_OPTION( bool, ShouldUnrollLoops, "unroll-loops", "Whether the analysis should try to unroll loops with known bounds.", false)

+ANALYZER_OPTION(

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp index 2b1872f8386aad..8c822a0ecc3b6f 100644 --- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp +++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp @@ -2810,13 +2810,17 @@ void ExprEngine::processBranch( if (StTrue && StFalse) assert(!isa(Condition));

@@ -2827,10 +2831,7 @@ void ExprEngine::processBranch( // two iterations". (This pattern is common in FFMPEG and appears in // many other projects as well.) bool CompletedTwoIterations = IterationsCompletedInLoop.value_or(0) >= 2;

@@ -2840,8 +2841,25 @@ void ExprEngine::processBranch( Builder.generateNode(StTrue, true, PredN); }

} currBldrCtx = nullptr; } diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index d5eb790b82f238..3611f6cb40c214 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -11,6 +11,7 @@ // CHECK-NEXT: alpha.osx.cocoa.DirectIvarAssignment:AnnotatedFunctions = false // CHECK-NEXT: apply-fixits = false // CHECK-NEXT: assume-controlled-environment = false +// CHECK-NEXT: assume-one-iteration = false // CHECK-NEXT: avoid-suppressing-null-argument-paths = false // CHECK-NEXT: c++-allocator-inlining = true // CHECK-NEXT: c++-container-inlining = false diff --git a/clang/test/Analysis/loop-assumptions.c b/clang/test/Analysis/loop-assumptions.c index eb0ffdce722e0c..bd8b74b1d531f9 100644 --- a/clang/test/Analysis/loop-assumptions.c +++ b/clang/test/Analysis/loop-assumptions.c @@ -1,25 +1,48 @@ // RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
-// RUN: -verify=expected,eagerlyassume %s +// RUN: -verify=expected,noassumeone,eagerlyassume,combo %s // RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
// RUN: -analyzer-config eagerly-assume=false
+// RUN: -verify=expected,noassumeone,noeagerlyassume,combo %s +// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
+// RUN: -analyzer-config assume-one-iteration=true
+// RUN: -verify=expected,eagerlyassume,combo %s +// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection
+// RUN: -analyzer-config assume-one-iteration=true,eagerly-assume=false
// RUN: -verify=expected,noeagerlyassume %s

+// The verify tag "combo" is used for one unique warning which is produced in three +// of the four RUN combinations. + // These tests validate the logic within ExprEngine::processBranch which // ensures that in loops with opaque conditions we don't assume execution paths // if the code does not imply that they are possible. +// In particular, if two (or more) iterations are already completed in a loop, +// we don't assume that there can be another iteration. Moreover, if the +// analyzer option assume-one-iteration is enabled, then we don't assume that +// a loop can be skipped completely.

void clang_analyzer_numTimesReached(void); -void clang_analyzer_warnIfReached(void); void clang_analyzer_dump(int);

-void clearCondition(void) {

void opaqueCondition(int arg) { @@ -28,10 +51,13 @@ void opaqueCondition(int arg) { // that more than two iterations are possible. (It does imply that two // iterations may be possible at least in some cases, because otherwise an // if would've been enough.)

int check(void); @@ -42,22 +68,26 @@ void opaqueConditionCall(int arg) { // insert an assertion to guide the analyzer and rule out more than two // iterations (so the analyzer needs to proactively avoid those unjustified // branches).

void opaqueConditionDoWhile(int arg) { // Same situation as opaqueCondition() but with a do {} while () loop. // This is tested separately because this loop type is a special case in the // iteration count calculation.

void dontRememberOldBifurcation(int arg) { @@ -69,7 +99,7 @@ void dontRememberOldBifurcation(int arg) { // by default), because the code remembered that there was a bifurcation on // the first iteration of the loop and didn't realize that this is obsolete.

void dontAssumeFourthIterartion(int arg) {

@@ -89,10 +121,10 @@ void dontAssumeFourthIterartion(int arg) { // iterations (because it knows that arg != 2 at that point), so it // performs a third iteration, but it does not assume that a fourth iteration // is also possible.

#define TRUE 1 @@ -108,20 +140,24 @@ void shortCircuitInLoopCondition(int arg) { // false positive on the ffmpeg codebase. Eventually we should properly // recognize the full syntactical loop condition expression as "the loop // condition", but this will be complicated to implement.

void shortCircuitInLoopConditionRHS(int arg) { // Unlike shortCircuitInLoopCondition(), this case is handled properly // because the analyzer thinks that the right hand side of the && is the // loop condition.

void eagerlyAssumeInSubexpression(int arg) { @@ -134,16 +170,22 @@ void eagerlyAssumeInSubexpression(int arg) { // sub-expression of the loop condition (as in this contrived test case). // FIXME: I don't know a real-world example for this inconsistency, but it // would be good to eliminate it eventually.

void calledTwice(int arg, int isFirstCall) { // This function is called twice (with two different unknown 'arg' values) to // check the iteration count handling in this situation.