RFC: Integrate Clang-Tidy checkers into Clang Static Analyzer (original) (raw)

Malavika Samak, Static Security Tools, Apple

Summary

This RFC proposes an integration that allows Clang Static Analyzer (CSA) users to run Clang-Tidy and Clang Static Analyzer (CSA) in a single analysis pass. The integration will allow users to enable, disable, and configure Clang-Tidy checkers directly through CSA’s command-line interface. The integration does not enable Clang-Tidy checkers by default; users explicitly choose which checkers to run and will have full control to enable any combination of checkers or disable them entirely. We have already built and deployed the proposed integration internally with positive results, and this proposal seeks to upstream these efforts to benefit the broader clang community.

Motivation

Clang Static Analyzer (CSA) and Clang-Tidy are both built to help improve code quality by detecting defects and suspicious code patterns. However, they employ different analysis techniques: CSA employs sophisticated data-flow and path-sensitive analysis and clang-tidy utilizes faster AST-based pattern matchers. This difference in approach means each tool excels at identifying distinct classes of defects, while some overlap may exist where both can detect similar issues. To avail the benefits of both tools, currently developers must run both tools separately and then combine the individual results.

This workflow increases build times as the ASTs are built twice, once for each analysis and it also requires managing distinct configurations for both tools. Further, this approach requires users to merge the results to create a unified view of code quality issues. Finally, it creates a placement confusion for the tool developers when introducing new checkers in determining the most appropriate home for every new analysis.

To address this we propose to create an integration between the two tools. The integration will create the ability to execute clang-tidy checkers as part of the CSA execution. Such an execution is expected to cut down the compilation overhead. This integration will enable unified reporting across all checkers, delivering both analysis results in existing CSA output formats— Plist, SARIF and HTML. This should also simplify the user workflow where they can now consolidate everything into one command execution and one configuration approach. We also expect this to simplify and speed up CI/CD pipelines.

Design Goals

The proposed integration adheres to the following design goals:

Proposed Solution

Non-Goals

While this integration aims to provide an unified static analysis experience, the following is out-of-scope:

Overall Architecture

Overview: The Clang-Tidy integration is built on a multiplexed consumer architecture that enables both CSA and Clang-tidy to analyze the same Abstract Syntax Tree (AST) in a single compiler invocation. This design minimizes overhead by eliminating duplicate parsing, semantic analysis, and AST construction that would occur when running the tools separately. The figure below provides the high level overview of the proposed integration.

Integration-architecture.jpg

Overall Data Flow and Control Flow

The integration extends the existing CSA pipeline with new components while preserving the original architecture.

Command-Line Processing and Configuration Loading
The flow begins with command-line parsing in the compiler invocation layer, which already handles CSA flags for checker enablement. The integration will extend this to also parse the newly introduced front end flags. This involves applying the same validation patterns as existing CSA flag handling, checking for empty strings and basic syntax errors before storing raw argument values in two new vectors added to the existing analyzer options class. The parsed options flow into the existing frontend pipeline unchanged: the standard preprocessor, parser, and semantic analysis stages execute exactly as before, producing the same AST representation.

Consumer Creation and Multiplexing
The integration adds a new multiplexer that creates divergence at consumer creation. The AnalysisConsumer which previously created only a consumer for CSA, now will conditionally create a second consumer if at least one clang-tidy checker is enabled. While creating the clang-tidy consumer a newly added helper function creates the configuration for clang-tidy. The configuration processor detects format by string inspection (new logic), wraps simple key=value inputs provided via -analyzer-tidy-config flag into YAML CheckOptions structure (new transformation), parses with Clang-Tidy’s existing configuration parser, and merges configurations using Clang-Tidy’s existing merge method. The resulting configuration initializes a Clang-Tidy context object (existing Clang-Tidy class used in new context), which uses existing Clang-Tidy infrastructure to register checkfactories, instantiate checks, and populate a matcher framework.

The factory then creates a new multiplexing consumer that wraps both the existing CSA consumer and the new Clang-Tidy consumer, implementing the standard AST consumer interface (existing interface) by forwarding callbacks to both underlying consumers (new multiplexing logic). This multiplexer is the key architectural component enabling unified analysis: it presents a single consumer interface to the frontend while internally routing to two independent analysis engines.

Analysis Execution
When the frontend invokes the translation unit completion callback on the multiplexer (existing callback mechanism), the multiplexer routes it to both consumers. The CSA consumer executes CSA’s existing analysis pipeline completely unchanged—building control flow graphs, running symbolic execution, generating path diagnostics. Simultaneously (conceptually, though sequentially in implementation), the Clang-tidy consumer executes Clang-Tidy’s existing analysis pipeline—traversing the AST with pattern matchers, invoking check callbacks, generating standard diagnostics. Both analysis engines operate independently on the same AST without information sharing, maintaining clean separation of concerns.

Diagnostic Conversion and Unification
A new diagnostic is added to the pipeline, which intercepts Clang-Tidy’s diagnostic objects (existing type) before they reach Clang-Tidy’s normal output. This converter extracts check names using the context’s existing accessor method, constructs bug type strings with new formatting (“Clang-Tidy [check-name]”), applies category mapping using a new lookup table (either mapping to specific CSA categories like “Logic Error” or defaulting to “Clang-Tidy [module]”), converts notes to path events (new transformation), and creates path diagnostic objects (existing CSA type) that are structurally identical to CSA’s output. This conversion ensures that both CSA and Clang-Tidy findings flow through the same reporting infrastructure.

Output Generation
The converted diagnostics merge with CSA’s path diagnostics in the existing path diagnostic consumer infrastructure. The integration adds a new convenience function that creates multiple output consumers simultaneously: when requested, it instantiates the HTML consumer to generate interactive reports, the plist consumer to serialize to XML, and the text consumer to emit minimal console output, all writing to the same base output path. This combined consumer approach simplifies invocation by allowing users to generate multiple output formats in a single analysis run. Additionally, users can still invoke individual consumers directly:the SARIF consumer (existing, unchanged) generates JSON when explicitly requested. The existing consumers cannot distinguish between CSA-generated and converter-generated path diagnostics, ensuring unified output without modifying any output-generating code. All findings from both tools are placed in the same output files with consistent formatting and structure.

Backward Compatibility
Backward compatibility is maintained by making all new code paths conditional. If the no clang-tidy checkers are enabled, the factory creates only the existing CSA consumer, the multiplexer is not instantiated, and the system executes the original CSA-only code path with zero overhead. Existing CSA tests run unchanged, existing command-line invocations work identically, and existing output formats remain byte-for-byte compatible when Clang-Tidy integration is not enabled.

Testing Strategy

The integration will be validated through a suite of unit and integration tests that verify correct operation.

Overhead analysis

To better understand the overall benefit of the clang-tidy integration, we gathered the analysis time and peak memory demand for the following three workflows:

The table below presents the above mentioned data for two WebKit files: the largest file in the repository (2.8 MB) and an average-sized file (226 KB).

File name Size Integration CSA Clang-tidy
ReaderArticleFinderSource.cpp 2.8MB 5.73 sec, 107.71 MB 4.46 sec, 104.77 MB 5.48 sec, 77.28 MB
FormMetadataJSControllerSource.cpp 226KB 0.93 sec, 90.49 MB 0.8 sec, 89.35 MB 0.92 sec, 60.13 MB

This indicates that the overall analysis time of executing the CSA and clang-tidy individually is expected to be higher than executing the CSA with clang-tidy integration for most users.

Limitations

No Cross-Tool Information Sharing
Clang-Tidy checkers do not receive path-sensitive information from CSA. They continue to operate with standard AST-based analysis. The integration provides unified execution but does not enhance Clang-Tidy’s analysis capabilities. Bridging these two architectures (AST matchers vs. symbolic execution) is not the purpose or within scope for this integration.

Fix-It Application Not Integrated
Clang-Tidy fix-its are preserved in diagnostics and SARIF output, since the CSA interface currently does not support applying fix-its. Users must run standalone clang-tidy with -fix to apply fixes.

Checker Naming Differences
CSA uses package structure (core.NullDereference) while Clang-Tidy uses a module based structure (bugprone-assert-side-effect). The integration will not unify the naming conventions to ensure minimal disruption to the existing users/integrations of both tools.

No Automatic DeduplicationThe integration does not automatically deduplicate findings between CSA and Clang-Tidy. If both tools detect the same issue, both warnings appear in the output.

Conclusion

This integration combines the complementary strengths of CSA and Clang-Tidy in a single, efficient analysis pass, eliminating the overhead and complexity of running separate tools. By unifying these analysis engines under a consistent interface with intelligent defaults and flexible configuration, we believe this integration will significantly improve the static analysis experience for users. We welcome community feedback on this proposal.