[analyzer] Avoid unnecessary super region invalidation in CStringChecker by flovent · Pull Request #146212 · llvm/llvm-project (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Bounded string functions takes smallest of two values as it's copy size (amountCopied variable in evalStrcpyCommon), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.
for strlcat: amountCopied = min (size - dstLen - 1 , srcLen)
for others: amountCopied = min (srcLen, size)
Currently when one of two values is unknown or SValBuilder can't decide which one is smaller, amountCopied will remain UnknownVal, which will invalidate copy destination's super region unconditionally.
This patch add check to see if one of these two values is definitely in-bound, if so amountCopied has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.
Note: This patch uses size as an approximation of size - dstLen - 1 in strlcat case because currently analyzer doesn't handle complex expressions like this very well.
Closes #143807.
@llvm/pr-subscribers-clang
Author: None (flovent)
Changes
Bounded string functions takes smallest of two values as it's copy size (amountCopied variable in evalStrcpyCommon), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.
for strlcat: amountCopied = min (size - dstLen - 1 , srcLen)
for others: amountCopied = min (srcLen, size)
Currently when one of two values is unknown or SValBuilder can't decide which one is smaller, amountCopied will remain UnknownVal, which will invalidate copy destination's super region unconditionally.
This patch add check to see if one of these two values is definitely in-bound, if so amountCopied has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.
Closes #143807.
Full diff: https://github.com/llvm/llvm-project/pull/146212.diff
2 Files Affected:
- (modified) clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp (+71-6)
- (added) clang/test/Analysis/cstring-should-not-invalidate.cpp (+107)
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp index 4d12fdcec1f1a..433fd2ce5f292 100644 --- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp @@ -272,7 +272,8 @@ class CStringChecker : public Checker< eval::Call, static ProgramStateRef invalidateDestinationBufferBySize(CheckerContext &C, ProgramStateRef S, const Expr *BufE, ConstCFGElementRef Elem,
SVal BufV, SVal SizeV, QualType SizeTy);
SVal BufV, SVal SizeV, QualType SizeTy,
/// Operation never overflows, do not invalidate the super region. static ProgramStateRef invalidateDestinationBufferNeverOverflows(bool CouldAccessOutOfBound = true);
@@ -1211,14 +1212,17 @@ bool CStringChecker::isFirstBufInBound(CheckerContext &C, ProgramStateRef State,
ProgramStateRef CStringChecker::invalidateDestinationBufferBySize( CheckerContext &C, ProgramStateRef S, const Expr *BufE,
- ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy) {
- ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy,
- bool CouldAccessOutOfBound) {
auto InvalidationTraitOperations =
[&C, S, BufTy = BufE->getType(), BufV, SizeV,SizeTy](RegionAndSymbolInvalidationTraits &ITraits, const MemRegion *R) {
[&C, S, BufTy = BufE->getType(), BufV, SizeV, SizeTy,CouldAccessOutOfBound](RegionAndSymbolInvalidationTraits &ITraits,const MemRegion *R) { // If destination buffer is a field region and access is in bound, do // not invalidate its super region. if (MemRegion::FieldRegionKind == R->getKind() &&
isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy)) {
(!CouldAccessOutOfBound ||isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy))) { ITraits.setTrait( R, RegionAndSymbolInvalidationTraits::TK_DoNotInvalidateSuperRegion);
@@ -2223,6 +2227,67 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, Result = lastElement; }
- // For bounded method, amountCopied take the minimum of two values,
- // for ConcatFnKind::strlcat:
- // amountCopied = min (size - dstLen - 1 , srcLen)
- // for others:
- // amountCopied = min (srcLen, size)
- // So even if we don't know about amountCopied, as long as one of them will
- // not cause an out-of-bound access, the whole function's operation will not
- // too, that will avoid invalidating the superRegion of data member in that
- // situation.
- bool CouldAccessOutOfBound = true;
- if (IsBounded && amountCopied.isUnknown()) {
// Get the max number of characters to copy.SizeArgExpr lenExpr = {{Call.getArgExpr(2), 2}};SVal lenVal = state->getSVal(lenExpr.Expression, LCtx);// Protect against misdeclared strncpy().lenVal =svalBuilder.evalCast(lenVal, sizeTy, lenExpr.Expression->getType());std::optional<NonLoc> lenValNL = lenVal.getAs<NonLoc>();auto CouldAccessOutOfBoundForSVal = [&](NonLoc Val) -> bool {return !isFirstBufInBound(C, state, C.getSVal(Dst.Expression),Dst.Expression->getType(), Val,C.getASTContext().getSizeType());};if (strLengthNL) {CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*strLengthNL);}if (CouldAccessOutOfBound && lenValNL) {switch (appendK) {case ConcatFnKind::none:case ConcatFnKind::strcat: {CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*lenValNL);break;}case ConcatFnKind::strlcat: {if (!dstStrLengthNL)break;SVal freeSpace = svalBuilder.evalBinOpNN(state, BO_Sub, *lenValNL,*dstStrLengthNL, sizeTy);if (!isa<NonLoc>(freeSpace))break;freeSpace =svalBuilder.evalBinOp(state, BO_Sub, freeSpace,svalBuilder.makeIntVal(1, sizeTy), sizeTy);std::optional<NonLoc> freeSpaceNL = freeSpace.getAs<NonLoc>();if (!freeSpaceNL)break;CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*freeSpaceNL);break;}}}- }
// Invalidate the destination (regular invalidation without pointer-escaping // the address of the top-level region). This must happen before we set the // C string length because invalidation will clear the length.
@@ -2232,7 +2297,7 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, // string, but that's still an improvement over blank invalidation. state = invalidateDestinationBufferBySize( C, state, Dst.Expression, Call.getCFGElementRef(), *dstRegVal,
amountCopied, C.getASTContext().getSizeType());
// Invalidate the source (const-invalidation without const-pointer-escaping // the address of the top-level region).amountCopied, C.getASTContext().getSizeType(), CouldAccessOutOfBound);
diff --git a/clang/test/Analysis/cstring-should-not-invalidate.cpp b/clang/test/Analysis/cstring-should-not-invalidate.cpp new file mode 100644 index 0000000000000..14c92447c52ca --- /dev/null +++ b/clang/test/Analysis/cstring-should-not-invalidate.cpp @@ -0,0 +1,107 @@ +// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection +// -analyzer-config c++-inlining=constructors -verify %s + +// expected-no-diagnostics + +typedef unsigned int size_t; + +char *strncpy(char *dest, const char *src, size_t x); + +// issue 143807 +struct strncpyTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strncpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strncpy(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strncpyTest(char *src, size_t n) {
- strncpyTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
- +size_t strlcpy(char *dest, const char *src, size_t size);
- +struct strlcpyTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strlcpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strlcpy(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strlcpyTest(char *src, size_t n) {
- strlcpyTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
- +char *strncat(char *s1, const char *s2, size_t n);
- +struct strncatTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strncat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strncat(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strncatTest(char *src, size_t n) {
- strncatTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
- +size_t strlcat(char *dst, const char *src, size_t size);
- +struct strlcatTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strlcat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strlcat(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strlcatTest(char *src, size_t n) {
- strlcatTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
@llvm/pr-subscribers-clang-static-analyzer-1
Author: None (flovent)
Changes
Bounded string functions takes smallest of two values as it's copy size (amountCopied variable in evalStrcpyCommon), and it's used to decided whether this operation will cause out-of-bound access and invalidate it's super region if it does.
for strlcat: amountCopied = min (size - dstLen - 1 , srcLen)
for others: amountCopied = min (srcLen, size)
Currently when one of two values is unknown or SValBuilder can't decide which one is smaller, amountCopied will remain UnknownVal, which will invalidate copy destination's super region unconditionally.
This patch add check to see if one of these two values is definitely in-bound, if so amountCopied has to be in-bound too, because it‘s less than or equal to them, we can avoid the invalidation of super region and some related false positives in this situation.
Closes #143807.
Full diff: https://github.com/llvm/llvm-project/pull/146212.diff
2 Files Affected:
- (modified) clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp (+71-6)
- (added) clang/test/Analysis/cstring-should-not-invalidate.cpp (+107)
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp index 4d12fdcec1f1a..433fd2ce5f292 100644 --- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp @@ -272,7 +272,8 @@ class CStringChecker : public Checker< eval::Call, static ProgramStateRef invalidateDestinationBufferBySize(CheckerContext &C, ProgramStateRef S, const Expr *BufE, ConstCFGElementRef Elem,
SVal BufV, SVal SizeV, QualType SizeTy);
SVal BufV, SVal SizeV, QualType SizeTy,
/// Operation never overflows, do not invalidate the super region. static ProgramStateRef invalidateDestinationBufferNeverOverflows(bool CouldAccessOutOfBound = true);
@@ -1211,14 +1212,17 @@ bool CStringChecker::isFirstBufInBound(CheckerContext &C, ProgramStateRef State,
ProgramStateRef CStringChecker::invalidateDestinationBufferBySize( CheckerContext &C, ProgramStateRef S, const Expr *BufE,
- ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy) {
- ConstCFGElementRef Elem, SVal BufV, SVal SizeV, QualType SizeTy,
- bool CouldAccessOutOfBound) {
auto InvalidationTraitOperations =
[&C, S, BufTy = BufE->getType(), BufV, SizeV,SizeTy](RegionAndSymbolInvalidationTraits &ITraits, const MemRegion *R) {
[&C, S, BufTy = BufE->getType(), BufV, SizeV, SizeTy,CouldAccessOutOfBound](RegionAndSymbolInvalidationTraits &ITraits,const MemRegion *R) { // If destination buffer is a field region and access is in bound, do // not invalidate its super region. if (MemRegion::FieldRegionKind == R->getKind() &&
isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy)) {
(!CouldAccessOutOfBound ||isFirstBufInBound(C, S, BufV, BufTy, SizeV, SizeTy))) { ITraits.setTrait( R, RegionAndSymbolInvalidationTraits::TK_DoNotInvalidateSuperRegion);
@@ -2223,6 +2227,67 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, Result = lastElement; }
- // For bounded method, amountCopied take the minimum of two values,
- // for ConcatFnKind::strlcat:
- // amountCopied = min (size - dstLen - 1 , srcLen)
- // for others:
- // amountCopied = min (srcLen, size)
- // So even if we don't know about amountCopied, as long as one of them will
- // not cause an out-of-bound access, the whole function's operation will not
- // too, that will avoid invalidating the superRegion of data member in that
- // situation.
- bool CouldAccessOutOfBound = true;
- if (IsBounded && amountCopied.isUnknown()) {
// Get the max number of characters to copy.SizeArgExpr lenExpr = {{Call.getArgExpr(2), 2}};SVal lenVal = state->getSVal(lenExpr.Expression, LCtx);// Protect against misdeclared strncpy().lenVal =svalBuilder.evalCast(lenVal, sizeTy, lenExpr.Expression->getType());std::optional<NonLoc> lenValNL = lenVal.getAs<NonLoc>();auto CouldAccessOutOfBoundForSVal = [&](NonLoc Val) -> bool {return !isFirstBufInBound(C, state, C.getSVal(Dst.Expression),Dst.Expression->getType(), Val,C.getASTContext().getSizeType());};if (strLengthNL) {CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*strLengthNL);}if (CouldAccessOutOfBound && lenValNL) {switch (appendK) {case ConcatFnKind::none:case ConcatFnKind::strcat: {CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*lenValNL);break;}case ConcatFnKind::strlcat: {if (!dstStrLengthNL)break;SVal freeSpace = svalBuilder.evalBinOpNN(state, BO_Sub, *lenValNL,*dstStrLengthNL, sizeTy);if (!isa<NonLoc>(freeSpace))break;freeSpace =svalBuilder.evalBinOp(state, BO_Sub, freeSpace,svalBuilder.makeIntVal(1, sizeTy), sizeTy);std::optional<NonLoc> freeSpaceNL = freeSpace.getAs<NonLoc>();if (!freeSpaceNL)break;CouldAccessOutOfBound = CouldAccessOutOfBoundForSVal(*freeSpaceNL);break;}}}- }
// Invalidate the destination (regular invalidation without pointer-escaping // the address of the top-level region). This must happen before we set the // C string length because invalidation will clear the length.
@@ -2232,7 +2297,7 @@ void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallEvent &Call, // string, but that's still an improvement over blank invalidation. state = invalidateDestinationBufferBySize( C, state, Dst.Expression, Call.getCFGElementRef(), *dstRegVal,
amountCopied, C.getASTContext().getSizeType());
// Invalidate the source (const-invalidation without const-pointer-escaping // the address of the top-level region).amountCopied, C.getASTContext().getSizeType(), CouldAccessOutOfBound);
diff --git a/clang/test/Analysis/cstring-should-not-invalidate.cpp b/clang/test/Analysis/cstring-should-not-invalidate.cpp new file mode 100644 index 0000000000000..14c92447c52ca --- /dev/null +++ b/clang/test/Analysis/cstring-should-not-invalidate.cpp @@ -0,0 +1,107 @@ +// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection +// -analyzer-config c++-inlining=constructors -verify %s + +// expected-no-diagnostics + +typedef unsigned int size_t; + +char *strncpy(char *dest, const char *src, size_t x); + +// issue 143807 +struct strncpyTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strncpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strncpy(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strncpyTest(char *src, size_t n) {
- strncpyTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
- +size_t strlcpy(char *dest, const char *src, size_t size);
- +struct strlcpyTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strlcpy(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strlcpy(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strlcpyTest(char *src, size_t n) {
- strlcpyTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
- +char *strncat(char *s1, const char *s2, size_t n);
- +struct strncatTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strncat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strncat(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strncatTest(char *src, size_t n) {
- strncatTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
- +size_t strlcat(char *dst, const char *src, size_t size);
- +struct strlcatTestClass {
- int *m_ptr;
- char m_buff[1000];
- void KnownLen(char *src) {
- m_ptr = new int;
- strlcat(m_buff, src, sizeof(m_buff)); // known len but unknown src size
- delete m_ptr; // no warning
- }
- void KnownSrcLen(size_t n) {
- m_ptr = new int;
- strlcat(m_buff, "xyz", n); // known src size but unknown len
- delete m_ptr; // no warning
- } +};
- +void strlcatTest(char *src, size_t n) {
- strlcatTestClass rep;
- rep.KnownLen(src);
- rep.KnownSrcLen(n); +}
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed PR summary. It makes sense. I had to think about it carefully but I agree with the motivation.
When I looked at the code it looked really complicated. Probably more than I expected it to be.
I think adding another bool parameter to the already crowded function is not ideal. We should look for some other way, maybe a different overload, or restructuring the code in other ways.
Thanks for the detailed PR summary. It makes sense. I had to think about it carefully but I agree with the motivation. When I looked at the code it looked really complicated. Probably more than I expected it to be.
CStringChecker is a very complicated checker indeed, it may took me more time to decide how to change this file than finding the cause of this problem, finally i added this parameter with default value to reduce the modifications required because invalidateDestinationBufferBySize is used in model to other functions like memset, memcpy too.
I think adding another bool parameter to the already crowded function is not ideal. We should look for some other way, maybe a different overload, or restructuring the code in other ways.
I just find out there are three member functions whose name start with invalidateDestinationBuffer, seems invalidateDestinationBufferNeverOverflows should be the right call when CouldAccessOutOfBound == false.
CI failures seem unlreated to the new change in this PR.
I resign from review. I don't have time to review 100+ lines PRs right now.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this change, but I added a few suggestions in inline comments.
Also please follow the Coding Standard and use UpperCamelCase variables for variables (including ones that store lambda functions), while lowerCamelCase is for methods. Of course, it is also right to follow the "local customs" of older code fragments, but I'd suggest gradually transitioning to the global standard by following the coding standard for the freshly added variables and perhaps transitioning the variables which are mostly used in the code that you edit.
Also please follow the Coding Standard and use
UpperCamelCasevariables for variables (including ones that store lambda functions), whilelowerCamelCaseis for methods. Of course, it is also right to follow the "local customs" of older code fragments, but I'd suggest gradually transitioning to the global standard by following the coding standard for the freshly added variables and perhaps transitioning the variables which are mostly used in the code that you edit.
I totally agree using new standard in new code, i think i didn't notice that because i basiclly copy the orginal code to here because of the same logic and clangd doesn't give warning about clang-tidy's readability-identifier-naming check (just found out it's disabled in clang folder), will check that manually next time.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
I added a few minor comments, but I hope that after those the PR could be merged.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry about that. It seems an error node will be generated when emitting alpha.unix.cstring.OutOfBounds warning, so rest of code will not be analyzed.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the updates!
Before merging the commit consider adding a note to the PR description / commit message that in the case of strlcat you use size as an approximation of the exact amountCopied.
@NagyDonat, I add a note in the end of PR description and I don't have a merge access, if that looks good please help me merge it, thank you!
flovent deleted the csa-issue-143807 branch