[analyzer] Avoid unnecessary super region invalidation in CStringChecker by flovent · Pull Request #146212 · llvm/llvm-project (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@flovent

Bounded string functions takes smallest of two values as it's copy size (amountCopied variable in evalStrcpyCommon), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.

for strlcat: amountCopied = min (size - dstLen - 1 , srcLen)
for others: amountCopied = min (srcLen, size)

Currently when one of two values is unknown or SValBuilder can't decide which one is smaller, amountCopied will remain UnknownVal, which will invalidate copy destination's super region unconditionally.

This patch add check to see if one of these two values is definitely in-bound, if so amountCopied has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.

Note: This patch uses size as an approximation of size - dstLen - 1 in strlcat case because currently analyzer doesn't handle complex expressions like this very well.

Closes #143807.

@flovent

@llvmbot

@llvm/pr-subscribers-clang

Author: None (flovent)

Changes

Bounded string functions takes smallest of two values as it's copy size (amountCopied variable in evalStrcpyCommon), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.

for strlcat: amountCopied = min (size - dstLen - 1 , srcLen)
for others: amountCopied = min (srcLen, size)

Currently when one of two values is unknown or SValBuilder can't decide which one is smaller, amountCopied will remain UnknownVal, which will invalidate copy destination's super region unconditionally.

This patch add check to see if one of these two values is definitely in-bound, if so amountCopied has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.

Closes #143807.


Full diff: https://github.com/llvm/llvm-project/pull/146212.diff

2 Files Affected:

diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp index 4d12fdcec1f1a..433fd2ce5f292 100644 --- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp @@ -272,7 +272,8 @@ class CStringChecker : public Checker< eval::Call, static ProgramStateRef invalidateDestinationBufferBySize(CheckerContext &C, ProgramStateRef S, const Expr *BufE, ConstCFGElementRef Elem,

@@ -1211,14 +1212,17 @@ bool CStringChecker::isFirstBufInBound(CheckerContext &C, ProgramStateRef State,

ProgramStateRef CStringChecker::invalidateDestinationBufferBySize( CheckerContext &C, ProgramStateRef S, const Expr *BufE,

auto InvalidationTraitOperations =

@@ -2223,6 +2227,67 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, Result = lastElement; }

@@ -2232,7 +2297,7 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, // string, but that's still an improvement over blank invalidation. state = invalidateDestinationBufferBySize( C, state, Dst.Expression, Call.getCFGElementRef(), *dstRegVal,

diff --git a/clang/test/Analysis/cstring-should-not-invalidate.cpp b/clang/test/Analysis/cstring-should-not-invalidate.cpp new file mode 100644 index 0000000000000..14c92447c52ca --- /dev/null +++ b/clang/test/Analysis/cstring-should-not-invalidate.cpp @@ -0,0 +1,107 @@ +// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection +// -analyzer-config c++-inlining=constructors -verify %s + +// expected-no-diagnostics + +typedef unsigned int size_t; + +char *strncpy(char *dest, const char *src, size_t x); + +// issue 143807 +struct strncpyTestClass {

@llvmbot

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (flovent)

Changes

Bounded string functions takes smallest of two values as it's copy size (amountCopied variable in evalStrcpyCommon), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.

for strlcat: amountCopied = min (size - dstLen - 1 , srcLen)
for others: amountCopied = min (srcLen, size)

Currently when one of two values is unknown or SValBuilder can't decide which one is smaller, amountCopied will remain UnknownVal, which will invalidate copy destination's super region unconditionally.

This patch add check to see if one of these two values is definitely in-bound, if so amountCopied has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.

Closes #143807.


Full diff: https://github.com/llvm/llvm-project/pull/146212.diff

2 Files Affected:

diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp index 4d12fdcec1f1a..433fd2ce5f292 100644 --- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp @@ -272,7 +272,8 @@ class CStringChecker : public Checker< eval::Call, static ProgramStateRef invalidateDestinationBufferBySize(CheckerContext &C, ProgramStateRef S, const Expr *BufE, ConstCFGElementRef Elem,

@@ -1211,14 +1212,17 @@ bool CStringChecker::isFirstBufInBound(CheckerContext &C, ProgramStateRef State,

ProgramStateRef CStringChecker::invalidateDestinationBufferBySize( CheckerContext &C, ProgramStateRef S, const Expr *BufE,

auto InvalidationTraitOperations =

@@ -2223,6 +2227,67 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, Result = lastElement; }

@@ -2232,7 +2297,7 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, // string, but that's still an improvement over blank invalidation. state = invalidateDestinationBufferBySize( C, state, Dst.Expression, Call.getCFGElementRef(), *dstRegVal,

diff --git a/clang/test/Analysis/cstring-should-not-invalidate.cpp b/clang/test/Analysis/cstring-should-not-invalidate.cpp new file mode 100644 index 0000000000000..14c92447c52ca --- /dev/null +++ b/clang/test/Analysis/cstring-should-not-invalidate.cpp @@ -0,0 +1,107 @@ +// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection +// -analyzer-config c++-inlining=constructors -verify %s + +// expected-no-diagnostics + +typedef unsigned int size_t; + +char *strncpy(char *dest, const char *src, size_t x); + +// issue 143807 +struct strncpyTestClass {

steakhal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed PR summary. It makes sense. I had to think about it carefully but I agree with the motivation.
When I looked at the code it looked really complicated. Probably more than I expected it to be.

I think adding another bool parameter to the already crowded function is not ideal. We should look for some other way, maybe a different overload, or restructuring the code in other ways.

@flovent

@flovent

Thanks for the detailed PR summary. It makes sense. I had to think about it carefully but I agree with the motivation. When I looked at the code it looked really complicated. Probably more than I expected it to be.

CStringChecker is a very complicated checker indeed, it may took me more time to decide how to change this file than finding the cause of this problem, finally i added this parameter with default value to reduce the modifications required because invalidateDestinationBufferBySize is used in model to other functions like memset, memcpy too.

I think adding another bool parameter to the already crowded function is not ideal. We should look for some other way, maybe a different overload, or restructuring the code in other ways.

I just find out there are three member functions whose name start with invalidateDestinationBuffer, seems invalidateDestinationBufferNeverOverflows should be the right call when CouldAccessOutOfBound == false.

@flovent

@flovent

CI failures seem unlreated to the new change in this PR.

@steakhal

I resign from review. I don't have time to review 100+ lines PRs right now.

NagyDonat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this change, but I added a few suggestions in inline comments.

Also please follow the Coding Standard and use UpperCamelCase variables for variables (including ones that store lambda functions), while lowerCamelCase is for methods. Of course, it is also right to follow the "local customs" of older code fragments, but I'd suggest gradually transitioning to the global standard by following the coding standard for the freshly added variables and perhaps transitioning the variables which are mostly used in the code that you edit.

@flovent

@flovent

@flovent

Also please follow the Coding Standard and use UpperCamelCase variables for variables (including ones that store lambda functions), while lowerCamelCase is for methods. Of course, it is also right to follow the "local customs" of older code fragments, but I'd suggest gradually transitioning to the global standard by following the coding standard for the freshly added variables and perhaps transitioning the variables which are mostly used in the code that you edit.

I totally agree using new standard in new code, i think i didn't notice that because i basiclly copy the orginal code to here because of the same logic and clangd doesn't give warning about clang-tidy's readability-identifier-naming check (just found out it's disabled in clang folder), will check that manually next time.

@flovent

NagyDonat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

I added a few minor comments, but I hope that after those the PR could be merged.

NagyDonat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flovent

@flovent

Oh, sorry about that. It seems an error node will be generated when emitting alpha.unix.cstring.OutOfBounds warning, so rest of code will not be analyzed.

NagyDonat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the updates!

Before merging the commit consider adding a note to the PR description / commit message that in the case of strlcat you use size as an approximation of the exact amountCopied.

@flovent

@NagyDonat, I add a note in the end of PR description and I don't have a merge access, if that looks good please help me merge it, thank you!

@flovent flovent deleted the csa-issue-143807 branch

July 7, 2025 12:24

Labels