Modernizing Compiler Design for Carbon's Toolchain (original) (raw)
Chandler Carruth
@chandlerc1024
chandlerc@{google,gmail}.com
CppNow 2023
Traditional compiler design
Lexer: text → tokens
- Handles text: characters, encodings, unicode, whitespace, comments, …
- Generally a regular language, and implemented with finite automata
- Produces encapsulated tokens:
- Keywords
- Punctuation & operators
- Identifiers
- Custom lexed literals: numbers, strings
Parser: tokens → AST (Abstract Syntax Tree)
- Handles nesting, structure, and relationships between tokens
- Generally a context-free or context-sensitive language
- Formalized with grammars and/or forms of pushdown automata
- Common grammar categories: LL, LL(k), LR, LR(k), LALR, …
- Loads of theory in this space
- If not using formal grammar, often implemented with a recursive descent parser
Semantic analysis: AST → correct AST
- Handles name resolution, type checking, and validating the input program
- Simple languages mostly reject invalid ASTs from the parser
- More complex languages transform and expand the AST
- Implicit operations
- Instantiation or metaprogramming
- Most complex languages (C++) also have feedback to change the parser… 😡
- Should be the last stage to detect errors in the program
Lowering: AST → IR → … → machine code
- IR: Intermediate Representation
- Handles modeling and optimizing the execution of the program on a machine
- Often iterative, potentially with successively lower level IRs
- Eventually produces machine code that can run on the target machine
- Shouldn’t fail on a valid input
- This is its own whole system below the first phase (LLVM, etc)
Historical influences on architecture
- Much of this was formalized around machines & languages from >50 years ago
- Semantic analysis and lowering were fairly straightforward
- Simple language rules (like C, or B)
- Minimal optimization in lowering
- No IRs, just AST → machine code (assembly)
- Focus on lexing, parsing, and ASTs
Imagined direct model
---------
| Lexer |
---------
Imagined direct model
--------- ----------
| Lexer | --> | Parser |
--------- ----------
---^--
Tokens
Imagined direct model
--------- ---------- -------------
| Lexer | --> | Parser | --> | Semantics |
--------- ---------- -------------
---^-- ----^-----
Tokens Parse Tree
Imagined direct model
--------- ---------- ------------- ------------
| Lexer | --> | Parser | --> | Semantics | --> | Lowering |
--------- ---------- ------------- ------------
---^-- ----^----- -^-
Tokens Parse Tree AST
Incremental & lazy parsing design
---------- ------------- ------------
| | -- One function -> | | | |
| Parser | | Semantics | | Lowering |
| | | | | |
---------- ------------- ------------
Incremental & lazy parsing design
---------- ------------- ------------
| | -- One function -> | | --> | |
| Parser | | Semantics | | Lowering |
| | | | | |
---------- ------------- ------------
Incremental & lazy parsing design
---------- ------------- ------------
| | -- One function -> | | --> | |
| Parser | -- Next function -> | Semantics | --> | Lowering |
| | | | | |
---------- ------------- ------------
Incremental & lazy parsing design
---------- ------------- ------------
| | -- One function -> | | --> | |
| Parser | -- Next function -> | Semantics | --> | Lowering |
| | -- Next function -> | | --> | |
---------- ------------- ------------
Incremental & lazy parsing design
--------- ---------- ------------- ------------
| | <- Next --- | | -- One function -> | | --> | |
| Lexer | | Parser | -- Next function -> | Semantics | --> | Lowering |
| | | | -- Next function -> | | --> | |
--------- ---------- ------------- ------------
Incremental & lazy parsing design
--------- ---------- ------------- ------------
| | <- Next --- | | -- One function -> | | --> | |
| Lexer | | Parser | -- Next function -> | Semantics | --> | Lowering |
| | -- Token -> | | -- Next function -> | | --> | |
--------- ---------- ------------- ------------
Designed around “locality” and streaming compilation
- All of this came from an idea of locality in the parsed code
- One token, one AST node at a time
- Enables streaming compilation
- Rooted in the needs of machines where the source code was often larger than the machine memory
ASTs further exacerbate the cache impact due to pervasive pointers for edges
And real world ASTs are … massive
#include <vector>
int test_sum(std::vector<int> data) {
int result = 0;
for (const auto& element : data) {
result += element;
}
return result;
}
FunctionDecl 0x120d95460 <:4:1, line:10:1> line:4:5 test_sum 'int (std::vector)'
|-ParmVarDecl 0x120d95368 <col:14, col:31=""> col:31 used data 'std::vector':'std::vector' destroyed
-CompoundStmt 0x120dd5d50 <col:37, line:10:1=""> |-DeclStmt 0x120dcde28 <line:5:3, col:17=""> |
-VarDecl 0x120dcdda0 <col:3, col:16=""> col:7 used result 'int' cinit
| -IntegerLiteral 0x120dcde08 'int' 0 |-CXXForRangeStmt 0x120dd5c08 <line:6:3, line:8:3=""> | |-<<>> | |-DeclStmt 0x120dce1d0 | |
-VarDecl 0x120dcdf98 col:30 implicit used __range1 'std::vector &' cinit
| | -DeclRefExpr 0x120dcde40 'std::vector':'std::vector' lvalue ParmVar 0x120d95368 'data' 'std::vector':'std::vector' | |-DeclStmt 0x120dd2f90 | |
-VarDecl 0x120dce268 col:28 implicit used __begin1 'iterator':'std::__wrap_iter' cinit
| | -CXXMemberCallExpr 0x120dce408 'iterator':'std::__wrap_iter' | |
-MemberExpr 0x120dce3d8 '' .begin 0x120db6c60
| | -DeclRefExpr 0x120dce1e8 'std::vector':'std::vector' lvalue Var 0x120dcdf98 '__range1' 'std::vector &' | |-DeclStmt 0x120dd2fa8 | |
-VarDecl 0x120dce310 col:28 implicit used __end1 'iterator':'std::__wrap_iter' cinit
| | -CXXMemberCallExpr 0x120dd2eb0 'iterator':'std::__wrap_iter' | |
-MemberExpr 0x120dd2e80 '' .end 0x120db6fd0
| | -DeclRefExpr 0x120dce208 'std::vector':'std::vector' lvalue Var 0x120dcdf98 '__range1' 'std::vector &' | |-CXXOperatorCallExpr 0x120dd56c0 'bool' '!=' adl | | |-ImplicitCastExpr 0x120dd56a8 'bool (*)(const __wrap_iter &, const __wrap_iter &) noexcept' | | |
-DeclRefExpr 0x120dd40f8 'bool (const __wrap_iter &, const __wrap_iter &) noexcept' lvalue Function 0x120dd3460 'operator!=' 'bool (const __wrap_iter &, const __wrap_iter &) noexcept'
| | |-ImplicitCastExpr 0x120dd40c8 'const __wrap_iter':'const std::__wrap_iter' lvalue
| | | -DeclRefExpr 0x120dd2fc0 'iterator':'std::__wrap_iter' lvalue Var 0x120dce268 '__begin1' 'iterator':'std::__wrap_iter' | |
-ImplicitCastExpr 0x120dd40e0 'const __wrap_iter':'const std::__wrap_iter' lvalue
| | -DeclRefExpr 0x120dd2fe0 'iterator':'std::__wrap_iter' lvalue Var 0x120dce310 '__end1' 'iterator':'std::__wrap_iter' | |-CXXOperatorCallExpr 0x120dd5860 '__wrap_iter':'std::__wrap_iter' lvalue '++' | | |-ImplicitCastExpr 0x120dd5848 '__wrap_iter &(*)() noexcept' | | |
-DeclRefExpr 0x120dd5718 '__wrap_iter &() noexcept' lvalue CXXMethod 0x120dd0528 'operator++' '__wrap_iter &() noexcept'
| | -DeclRefExpr 0x120dd56f8 'iterator':'std::__wrap_iter' lvalue Var 0x120dce268 '__begin1' 'iterator':'std::__wrap_iter' | |-DeclStmt 0x120dcdf38 <col:8, col:34=""> | |
-VarDecl 0x120dcded0 <col:8, col:28=""> col:20 used element 'int const &' cinit
| | -ImplicitCastExpr 0x120dd5b98 'int const':'const int' lvalue | |
-CXXOperatorCallExpr 0x120dd5a20 'int':'int' lvalue ''
| | |-ImplicitCastExpr 0x120dd5a08 'reference ()() const noexcept'
| | | -DeclRefExpr 0x120dd58f0 'reference () const noexcept' lvalue CXXMethod 0x120dd0070 'operator*' 'reference () const noexcept' | |
-ImplicitCastExpr 0x120dd58d8 'const std::__wrap_iter' lvalue
| | -DeclRefExpr 0x120dd58b8 'iterator':'std::__wrap_iter' lvalue Var 0x120dce268 '__begin1' 'iterator':'std::__wrap_iter' |
-CompoundStmt 0x120dd5cf0 <col:36, line:8:3="">
| -CompoundAssignOperator 0x120dd5cc0 <line:7:5, col:15=""> 'int' lvalue '+=' ComputeLHSTy='int' ComputeResultTy='int' | |-DeclRefExpr 0x120dd5c68 'int' lvalue Var 0x120dcdda0 'result' 'int' |
-ImplicitCastExpr 0x120dd5ca8 'int':'int'
| -DeclRefExpr 0x120dd5c88 'int const':'const int' lvalue Var 0x120dcded0 'element' 'int const &'
-ReturnStmt 0x120dd5d40 <line:9:3, col:10="">
-ImplicitCastExpr 0x120dd5d28 'int'
-DeclRefExpr 0x120dd5d08 'int' lvalue Var 0x120dcdda0 'result' 'int'
</line:9:3,></line:7:5,></col:36,></col:8,></col:8,></line:6:3,></col:3,></line:5:3,></col:37,></col:14,>
Do we need a better approach?
Carbon’s compile-time goals
Carbon has a goal of fast compile times:
Software development iteration has a critical “edit, test, debug” cycle. Developers will use IDEs, editors, compilers, and other tools that need different levels of parsing. For small projects, raw parsing speed is essential; for large software systems, scalability of parsing is also necessary.
Not just about compiling to a binary…
- Need fast tooling:
- Formatting
- Jump-to-definition
- Refactoring tools and other IDE plugins
- Many of these need interactive levels of speed
distcc
and other distributed compute approaches don’t apply
We set ourselves a challenge:
- Parse and lex at
10 million lines of code / second - Semantic analysis at
1 million lines of code / second - Lower to a binary at
0.1 million lines of code / second
Maybe…
Thinking about it the other way is eye opening:
- Average budget of 100 ns per line to lex and parse
- Average budget of 1 μs per line for semantic analysis
- Let’s think about these in terms of our latency numbers…
Latency numbers table:
Operation | Time in ns | Time in ms |
---|---|---|
CPU cycle | 0.3 - 0.5 | |
L1 cache reference | 1 | |
Branch misprediction | 3 | |
L2 cache reference | 4 | |
Mutex lock/unlock | 17 | |
Main memory reference | 100 | |
SSD Random Read | 17,000 | 0.017 |
Read 1 MB sequentially from memory | 38,000 | 0.038 |
Read 1 MB sequentially from SSD | 622,000 | 0.622 |
Mapping those onto our budgets
- About 200-300 cycles, maybe 500 instructions, per line lexed & parsed
- Only one main memory access per line lexed & parsed (on average)!!!
- Can’t allocate memory per token… or line of tokens…
- So many approaches just stop being viable if you want to hit this
- Have to use every byte of every cache line accessed to have any hope
😱😱😱
Ok, so how are we going to do this?
Data-oriented compiler design
Data-oriented compiler design
- Based on data-oriented design popularized in the games industry
- Locality alone isn’t a good model for modern hardware
- All inputs likely in memory, no slow disk or memory in practice
- Little or no “interesting” computation involved
- Memory- and cache-bandwidth will almost always be limiting factor
Simple and memory-dense rather than lazy
- Inputs all fit in memory, no underlying reason for laziness
- Emphasis on processing input near memory bandwidth speed
- Need resulting memory representation to be similarly dense as input
- Even for (very) large inputs, no reason to add complexity to eager approach
- 10k lines of code (< 40 bytes / line) easily fits into cache across wide range of hardware
- Streaming through cache at memory bandwidth seems easily acceptable speed
- Can focus on efficient, tight processing
Core data structure pattern:
- Primary homogenous dense array packed with most ubiquitous data
- Indexed access to allow most connections to use adjacency
- Avoids storage for these connections
- Side arrays for each different kind of secondary data needed
- Connected to primary array with compressed index (smaller than pointer)
- Flyweight handles wrap indices and provide keys to the APIs
Advantages of this pattern:
- Can densely pack main array, almost every bit is used
- Differently shaped data factored into their own densely packed arrays
- Enumerated states or bit-pack flags to fully use every byte of cache
- Indexed access to the “next” node doesn’t depend on reading the current node
- Processor can pipeline these memory accesses
Let’s look at how this manifests at every layer
Data-oriented lexing!
Lexing directly fits the desired pattern
- Lex into a token buffer
- Dense array of tokens
- Side arrays of identifiers, strings, literals
- Expose a “token” as a flyweight index into the buffer
Lexing details: source locations
- Classical challenge is mapping tokens back to source locations
- Very expensive – one of the most optimized parts of Clang’s lexer
- Carbon instead has a token be the source location
- Directly encodes its location within the source
- Computes the location extents on demand
- Result is a single 32-bit token ID gives both token & location
Lexing details: balanced delimiters
- Another challenge in parsing is recovering from unbalanced delimiters
- To make the parser much simpler, lexer pre-computes balanced delimiters
- Also lets it synthesize tokens to re-balance when necessary
- Parser never has to handle unbalanced token streams
Lexing implementation: a guided tour, live!
Data-oriented parsing!
Challenge: how to represent a tree
- Linearize the tree into an array based on expected iteration
- Makes the common traversal a linear walk
- Makes the hot edges between nodes be adjacency, breaking dependency chain
- Postorder traversal ends up being the most useful
- Force 1:1 correspondence to tokens
- Simpler allocation
- A single edge to a token is always sufficient, no ranges, etc.
- Leverage introducers to ensure the traversal “brackets” constructs
- Result ends up being essentially a stack machine for observing the parsed structure
var x: i32 = y + 1;
// TokenizedBuffer:
// --------
// 1) | var |
// --------
// 2) | x |
// --------
// 3) | : |
// --------
// 4) | i32 |
// --------
// 5) | = |
// --------
// 6) | y |
// --------
// 7) | + |
// --------
// 8) | 1 |
// --------
// 9) | ; |
// --------
var x : i32 = y + 1 ;
// TokenizedBuffer:
// --------
// 1) | var |
// --------
// 2) | x |
// --------
// 3) | : |
// --------
// 4) | i32 |
// --------
// 5) | = |
// --------
// 6) | y |
// --------
// 7) | + | ( y ) ( 1 )
// -------- \__ __/
// 8) | 1 | ( + )
// --------
// 9) | ; |
// --------
var x : i32 = y + 1 ;
var x : i32 = y + 1 ;
// TokenizedBuffer:
// --------
// 1) | var |
// --------
// 2) | x |
// --------
// 3) | : |
// --------
// 4) | i32 |
// --------
// 5) | = |
// --------
// 6) | y |
// --------
// 7) | + | ( x ) ( i32 ) ( y ) ( 1 )
// -------- \__ __/ \__ __/
// 8) | 1 | ( : ) ( + )
// --------
// 9) | ; |
// --------
var x : i32 = y + 1 ;
var x : i32 = y + 1 ;
// TokenizedBuffer:
// --------
// 1) | var |
// --------
// 2) | x |
// --------
// 3) | : |
// --------
// 4) | i32 |
// --------
// 5) | = |
// --------
// 6) | y |
// --------
// 7) | + | ( x ) ( i32 ) ( y ) ( 1 )
// -------- \__ __/ \__ __/
// 8) | 1 | ( var ) ( : ) ( = ) ( + )
// -------- \__________\___________\_________\______
// 9) | ; | ( ; )
// --------
var x : i32 = y + 1 ;
var x : i32 = y + 1 ;
// TokenizedBuffer:
// --------
// 1) | var |
// --------
// 2) | x |
// --------
// 3) | : |
// --------
// 4) | i32 |
// --------
// 5) | = |
// --------
// 6) | y |
// --------
// 7) | + | ( `<2>x` ) ( `<3>i32` ) ( `<6>y` ) ( `<7>1` )
// -------- \__ __/ \__ __/
// 8) | 1 | ( `<1>var` ) ( `<4>:` ) ( `<5>=` ) ( `<8>+` )
// -------- \__________\___________\_________\______
// 9) | ; | ( `<9>;` )
// --------
var x : i32 = y + 1 ;
var x : i32 = y + 1 ;
// TokenizedBuffer:
// --------
// 1) | var | ( var )
// -------- |
// 2) | x | | ( x )
// -------- | |
// 3) | : | | | ( i32 )
// -------- | \__ __/
// 4) | i32 | | ( : )
// -------- | |
// 5) | = | | | ( = )
// -------- | | |
// 6) | y | | | | ( y )
// -------- | | | |
// 7) | + | | | | | ( 1 )
// -------- | | | \__ __/
// 8) | 1 | | | | ( + )
// -------- \__________\___________\_________\______
// 9) | ; | ( ; )
// --------
var x : i32 = y + 1 ;
var x : i32 = y + 1 ;
// TokenizedBuffer: ParseTree:
// -------- --------
// 1) | var | ( `<1>var` ) | `<1>var` |
// -------- | --------
// 2) | x | | ( `<2>x` ) | `<2>x` |
// -------- | | --------
// 3) | : | | | ( `<3>i32` ) | `<3>i32` |
// -------- | \__ __/ --------
// 4) | i32 | | ( `<4>:` ) | `<4>:` |
// -------- | | --------
// 5) | = | | | ( `<5>=` ) | `<5>=` |
// -------- | | | --------
// 6) | y | | | | ( `<6>y` ) | `<6>y` |
// -------- | | | | --------
// 7) | + | | | | | ( `<7>1` ) | `<7>1` |
// -------- | | | \__ __/ --------
// 8) | 1 | | | | ( `<8>+` ) | `<8>+` |
// -------- \__________\___________\_________\______ --------
// 9) | ; | ( `<9>;` ) | `<9>;` |
// -------- --------
var x : i32 = y + 1 ;
Parse tree implementation: a guided tour, live!
How does the parser build the tree?
- Technically a recursive descent parser
- Faces a classic problem – deep recursion exhausting the call stack
- C++ and Clang have this problem as well
- Has led to a number of “exciting” tricks to work around it
- Especially problematic for library usage of the compiler
- Carbon creates a dedicated stack data structure for its parser
- Turns the parse into a “normal” state machine without recursive calls
Parser implementation: more live tour!
Data-oriented semantics!
Core idea: model semantics as an IR
IR here in the sense of a compiler’s Intermediate Representation
- Generally structured in blocks with an order of execution
For runtime code, this generates exactly the structure we want
- First, to type check and validate the semantics
- Then, to lower into a lower level IR like LLVM’s
We map other parts of semantic analysis into the IR space:
- Declaring names, types, etc., are compile time evaluation
- Type checking is evaluation, constant propagation, and then verifying
- Template instantiation similar to “specialization” in optimizing compilers
Imperative model for building all constructs in the language
Lends itself to well understood techniques for fast compile time:
- Interpreters
- Efficient streaming data structures
Some hope that this lends itself to implementing actual metaprogramming features
- But way too early to tell…
Really, this is very early and an area of active work
Live tiniest demo of Semantics IR
Built by walking the postorder parse tree
- Primary consumer driving the parse tree design
- Walk observes the parse structure
- Uses a stack data structure to provide any context or other details
- Efficiency hinges on a single pass over the parse tree
- This is difficult though
- Some places we defer subtrees: nested function bodies
- Other places use intermediate data structures
- Template instantiations are a fundamental challenge, but tolerable
Gets more help from the language
- Single pass is easier if we emit things before they’re needed
- Carbon has a principle of information accumulation
- Toolchain leverages this to directly emit complete semantics
- Limiting necessary repeated passes to monomorphization & instantiation
Again, really early days on semantics.
More to come!
Data-oriented lowering!
Eh… not really…
Somewhat data-oriented lowering?
- Because of the semantics IR, some aspects of lowering will fall out:
- Traversing the IR to emit LLVM IR will be fast and efficient
- Other operations like high-level optimizations or constant evaluation similarly
- But LLVM itself is not especially data-oriented in its design
- Currently no plan to try to change this, but interesting area of future work
Ultimately, limited by LLVM today
- LLVM heavily uses sparse, pointer-based data structures
- Tends to be very inefficient, often >50% cycles stalled on data
- Despite heavily optimized data structures below the core IR
Also, we have a long way to go before this can be our focus!
Aside: testing the compiler
Basic testing follows usual patterns
- Easier to write focused unit tests due to simpler language
- No preprocessor in the lexer, etc.
- Serializing each layer into line delimited JSON
- Easy to do quick things with
grep
- Can use powerful tools like
jq
- Easy to do quick things with
- LLVM FileCheck-style testing throughout
Fuzz testing is a more interesting challenge
- Historically, C++ compilers have been ~impossible to fuzz test
- None were built with continuous fuzz testing
- Very large number of failures throughout the system
- Difficult to create interesting inputs
- Carbon decided to have fuzz testing from day one
- And to fuzz test each layer
Fuzz testing with more complex inputs
- Also developed a protobuf-based fuzzer
- Models the Carbon AST in protobuf messages
- Uses structural fuzz testing systems build on top of protobufs to synthesize interesting and complex ASTs
- Renders them into source as fuzzer inputs
Our goal: a compiler that does not crash
- May still have plenty of bugs, but they shouldn’t crash
- Users should always get a useful error message
- Only way we know to achieve this is to do it from the very beginning
Some key takeaways…
Re-think traditional compiler design
Challenge assumptions with aggressive goals
Compiler & implementation design should be a major input to language design
Complete semantic analysis and lowering
Interop with C++ through Clang (come see tomorrow’s keynote!)
Bundling and being able to access a complete Clang-based C++ toolchain
Modern error messages and diagnostics
Building a
carbon-format
tool on top of the parserGenerating arbitrary amounts of random Carbon source code
- Parameterized distribution of language constructs
- Parameterized distribution of name & comment lengths
- Able to replicate representative distributions or edge cases
Aggressive benchmarking & optimization of each layer
- Maybe even generating C++ code for comparative benchmarking
All part of Carbon’s ongoing efforts to
- Modernize compiler design and implementation
- And deliver radical improvements to compile times
Thank you!
Resources and more information:
- https://github.com/carbon-language/carbon-lang#getting-started
- https://github.com/carbon-language/carbon-lang/tree/trunk/toolchain
- Design doc linked here and from the toolchain tree above
- Toolchain channel of our Discord server