How do I evaluate macros in Sema? (original) (raw)

December 1, 2025, 5:22pm 1

How do I evaluate a macro in Sema?

I’m trying to add a diagnostic for checking the arguments of standard libc functions and having the value of O_CREAT and PATH_MAX would be helpful.

As the function calls arguments are already available as Exprs they’re nice and easy to evaluate:

Expr::EvalResult Result;
TheCall->getArg(x)->EvaluateAsInt(Result, Context)
Result.Val.getInt().getExtValue();

I can get the tokens for the value of a given macro, and if the macro has been evaluated before the current line I can traverse the AST context for the Expr and evaluate that.
That code looks approximately like:

IdentifierInfo *MII = PP.getIdentifierInfo(MacroName);
MacroDefinition MacroDef = PP.getMacroDefinitionAtLoc(MII, Loc);
const MacroInfo *MI = MacroDef.getMacroInfo();
// ... Climb the chain of IdentifierInfos at Loc to find the real MacroInfo
SourceLocation MacroValueLoc MI->tokens_begin()->getLocation();

Then match use a DynamicRecursiveASTVisitor to check the SourceLocation of the first token to the spelling SourceLocation of every Expr in the AST.

Which generally works for that’s used something like O_CREAT, however if the user has used PATH_MAX there is likely no need to emit a diagnosis.

There is the private Preprocessor::EvaluateDirectiveExpression method however that seems to be there to avoid lexing #if blocks. Also tryExpandAsInteger in the static analyzer which has it’s own code for evaluating specifically integer literals.

The preprocessor is executed as tokens are lexed by the parser, so by the time you get to Sema, the macro has already been expanded.

The way I would try to tackle what you’re doing is:

Because these are standard C library functions, we often map those to builtins (see Builtins.td and friends) automagically, and so you could handle the checking starting from Sema::CheckBuiltinFunctionCall(). (If the functions aren’t builtins, you can do the same from Sema::BuildResolvedCallExpr() too.)
Once you’ve identified the function you’re after, check whether the argument to the function is an IntegerLiteral argument because each of these macros should resolve to an integer literal (if not, your checking logic needs to be expanded to whatever AST pattern you expect the expansion to be)
Once you’ve identified the argument you’re after, look at its SourceLocation to see if it’s an expansion. If it is, you can pass the spelling location (not the expansion location) to Lexer::getImmediateMacroName() to get the macro definition and see what its name is so you can match against O_CREAT or PATH_MAX.

An example of something like this in action is: llvm-project/clang/lib/Sema/SemaChecking.cpp at main · llvm/llvm-project · GitHub or llvm-project/clang/lib/Sema/SemaChecking.cpp at 25ab47bd407d6d92e587e2d545081ab25c909d86 · llvm/llvm-project · GitHub

Thanks for the quick response.

I’m currently just adding the functions to Bultins.td and relying on that. If that’s unsuitable for a given function it can be changed once I get the diagnostics working.

The check I’m currently working on is open, specifically whether the mode parameter should be present or omitted based on the value of flags (O_CREAT needs a mode for the file it’s creating).

Similar to your suggestion I’ve tried traversing the children of the flags argument and attempting to match the spelling SourceLocation of each sub expression to that of the macro value. This would work fine so long as the source specifically uses the macro in the argument and I abandon the search when A given token isn’t a r_paren, l_paren or pipe.

This would have the following results:

#define O_CREAT 10
open("hello", 10); // Issue but with no diagnosis
open("hello", O_CREAT | O_WRONLY); // Issue with diagnosis
open("hello", (O_CREAT)); // Issue with diagnosis
open("hello", O_CREAT & O_WRONLY); // No issue with no diagnosis
const int flags = O_CREAT;
open("hello", flags); // Issue but with no diagnosis

This is probably a okay option if evaluation isn’t practical.

However I’m also looking to do bounds checking on destination buffers where the standard assumes that it can safely write up to PATH_MAX.

In this case symptomatic source is likely to look like:

char dst[1024];
getcwd(dst);

I’d like to be able to just check the buffer size against the value of PATH_MAX.

I guess it’s likely that PATH_MAX would be undefined in these cases so maybe I’d be better off looking up the value based on the target OS.

Yeah, diagnosing this is a slippery slope into a static analysis check because it requires data flow analysis to know what is in the variable (generally speaking). Diagnostics from the frontend typically are not flow-sensitive due to compile time overhead for the check.

You could call Preprocessor::getMacroDefinition() on PATH_MAX to see if the macro is defined, and if so, call getMacroInfo() to see if it’s enabled and what tokens it expands to. However, if PATH_MAX is defined as something other than an integer literal, it becomes more complicated because you have to evaluate those tokens as a constant expression. It may be easier just to have a local mapping of PATH_MAX values for the target.