Parallel compilation - Rust Compiler Development Guide (original) (raw)

Rust Compiler Development Guide

Parallel compilation

As of November 2024, most of the rust compiler is now parallelized.

The following sections are kept for now but are quite outdated.


Code generation

During monomorphization the compiler splits up all the code to be generated into smaller chunks called codegen units. These are then generated by independent instances of LLVM running in parallel. At the end, the linker is run to combine all the codegen units together into one binary. This process occurs in the rustc_codegen_ssa::base module.

Data structures

The underlying thread-safe data-structures used in the parallel compiler can be found in the rustc_data_structures::sync module. These data structures are implemented differently depending on whether parallel-compiler is true.

data structure parallel non-parallel
Lock (parking_lot::Mutex) (std::cell::RefCell)
RwLock (parking_lot::RwLock) (std::cell::RefCell)
MTLock (Lock) (T)
ReadGuard parking_lot::RwLockReadGuard std::cell::Ref
MappedReadGuard parking_lot::MappedRwLockReadGuard std::cell::Ref
WriteGuard parking_lot::RwLockWriteGuard std::cell::RefMut
MappedWriteGuard parking_lot::MappedRwLockWriteGuard std::cell::RefMut
LockGuard parking_lot::MutexGuard std::cell::RefMut

WorkerLocal

WorkerLocal is a special data structure implemented for parallel compilers. It holds worker-locals values for each thread in a thread pool. You can only access the worker local value through the Deref impl on the thread pool it was constructed on. It panics otherwise.

WorkerLocal is used to implement the Arena allocator in the parallel environment, which is critical in parallel queries. Its implementation is located in the rustc_data_structures::sync::worker_local module. However, in the non-parallel compiler, it is implemented as (OneThread<T>), whose Tcan be accessed directly through Deref::deref.

Parallel iterator

The parallel iterators provided by the rayon crate are easy ways to implement parallelism. In the current implementation of the parallel compiler we use a custom fork of rayon to run tasks in parallel.

Some iterator functions are implemented to run loops in parallel when parallel-compiler is true.

Function(Omit Send and Sync) Introduction Owning Module
par_iter<T: IntoParallelIterator>(t: T) -> T::Iter generate a parallel iterator rustc_data_structure::sync
par_for_each_in<T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) generate a parallel iterator and run for_each on each element rustc_data_structure::sync
Map::par_body_owners(self, f: impl Fn(LocalDefId)) run f on all hir owners in the crate rustc_middle::hir::map
Map::par_for_each_module(self, f: impl Fn(LocalDefId)) run f on all modules and sub modules in the crate rustc_middle::hir::map
ModuleItems::par_items(&self, f: impl Fn(ItemId)) run f on all items in the module rustc_middle::hir
ModuleItems::par_trait_items(&self, f: impl Fn(TraitItemId)) run f on all trait items in the module rustc_middle::hir
ModuleItems::par_impl_items(&self, f: impl Fn(ImplItemId)) run f on all impl items in the module rustc_middle::hir
ModuleItems::par_foreign_items(&self, f: impl Fn(ForeignItemId)) run f on all foreign items in the module rustc_middle::hir

There are a lot of loops in the compiler which can possibly be parallelized using these functions. As of August 2022, scenarios where the parallel iterator function has been used are as follows:

caller scenario callee
rustc_metadata::rmeta::encoder::prefetch_mir Prefetch queries which will be needed later by metadata encoding par_iter
rustc_monomorphize::collector::collect_crate_mono_items Collect monomorphized items reachable from non-generic items par_for_each_in
rustc_interface::passes::analysis Check the validity of the match statements Map::par_body_owners
rustc_interface::passes::analysis MIR borrow check Map::par_body_owners
rustc_typeck::check::typeck_item_bodies Type check Map::par_body_owners
rustc_interface::passes::hir_id_validator::check_crate Check the validity of hir Map::par_for_each_module
rustc_interface::passes::analysis Check the validity of loops body, attributes, naked functions, unstable abi, const bodys Map::par_for_each_module
rustc_interface::passes::analysis Liveness and intrinsic checking of MIR Map::par_for_each_module
rustc_interface::passes::analysis Deathness checking Map::par_for_each_module
rustc_interface::passes::analysis Privacy checking Map::par_for_each_module
rustc_lint::late::check_crate Run per-module lints Map::par_for_each_module
rustc_typeck::check_crate Well-formedness checking Map::par_for_each_module

There are still many loops that have the potential to use parallel iterators.

Query system

The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much effort:

When a query foo is evaluated, the cache table for foo is locked.

The parallel query feature still has implementation to do, most of which is related to the previous Data Structures and Parallel Iterators. See this open feature tracking issue.

Rustdoc

As of November 2022, there are still a number of steps to complete before rustdoc rendering can be made parallel (see a open discussion of parallel rustdoc).

Resources

Here are some resources that can be used to learn more: