[RFC] Enhancing MLGO Inlining with IR2Vec Embeddings (original) (raw)

We plan to upstream support for generating IR2Vec embeddings into Machine Learning Guided Optimization (MLGO) for inlining. Initial training of the code size model on internal binaries, combining existing features with IR2Vec embeddings, demonstrates additional code size reductions of up to 4.2% with -Os and 3.8% with -Os and MLGO Inliner.

Design of IR2Vec

IR2Vec [1] is a program embedding approach designed specifically for LLVM IR. The IR2Vec embeddings capture syntactic, semantic, and structural properties of the IR through learned representations. The embeddings are trained in an unsupervised manner, offline, to capture the statistical correlation between the entities of the instructions (like opcodes, types and operands) in the IR. Broadly, the training involves learning to correctly “predict” the missing entity given its context. This process results in learning a vocabulary consisting of n-dimensional floating point vectors (embeddings) for each entity of the IR. The vocabulary is a dictionary that maps the entities of the IR to floating point vectors.

Once the dictionary is obtained through offline learning, the process of generating representation in LLVM involves a simple look-up of the learned vocabulary and aggregating it to compute the embeddings for basic blocks and functions.

The computation of IR2Vec embeddings does not introduce external dependencies or require a model inference during compile time of a program. The necessary model weights are extracted as a standalone vocabulary of about 64 floating-point vectors (corresponding to different opcodes, types, and operands in LLVM IR) in JSON format. Currently, this JSON vocabulary is read once into memory and used for computation. Going forward, this file read can be avoided by auto generating the vocabulary as maps during the build time.

Plan

Broadly, we intend to upstream a function analysis pass to compute the embeddings following different strategies with no additional dependencies. The corresponding patch is available at https://github.com/llvm/llvm-project/pull/134004. In the PR we have identified a bunch of FIXMEs and TODOs for performance improvements that we would address in incremental patches. Subsequently, we plan to patch the MLInlineAdvisor to make inlining decisions using the embeddings. Going forward, our goal is to replace a subset of the existing features that are “costly” to compute with the embeddings.

We plan to maintain the source code for training the vocabulary outside LLVM, and is currently available at https://github.com/IITH-Compilers/IR2Vec. We can subsequently explore if it makes sense to include under (e.g.) llvm/utils/mlgo-utils, once we accrue more experience with real-world use cases.

Experiments
Currently, LLVM supports MLGO for inlining to reduce code size and eviction in register allocation. These ML models are trained using hand-engineered features tailored to specific optimizations. On the other hand, IR2Vec is a program embedding approach designed specifically for LLVM IR. The IR2Vec embeddings capture syntactic, semantic, and structural properties of the IR through learned representations.

IR2Vec has demonstrated its effectiveness on different ML-driven optimizations like phase ordering [2], loop distribution [3], and register allocation [4] on standard benchmarks like SPEC CPU, TSVC, Polybench, etc. Before proposing this RFC, we wanted to validate the effectiveness and scalability on real-world scenarios.

To that end, following the existing approach, we trained two Reinforcement Learning models (using PPO): one only with MLGO features (existing approach) and the other by concatenating existing MLGO features with IR2Vec embeddings. Training was done on about 50K modules from our internal datacenter binary till convergence. Initial evaluation of the resulting policy on various size sensitive binaries internal to Google, clang, and opt (with -Os) results the following improvements.

Improvements in Text section Size

-Os	-Os with Current MLGO (Feature-based)	-Os with MLGO features + IR2Vec embeddings	% Additional Improvement over -Os	% Additional Improvement over -Os + MLGO
Internal_1	121.3M	117.1M	116.1M	4.29%	0.85%
internal_2	721.7M	714.5M	698.5M	3.21%	2.24%
clang	116M	113.5M	111.6M	3.79%	1.67%
opt	101.9M	103.04M	99.1M	2.75%	3.82%

Improvements in Total Binary Size (Stripped)

-Os	-Os with Current MLGO (Feature-based)	-Os with MLGO features + IR2Vec embeddings	% Additional Improvement over -Os	% Additional Improvement over -Os + MLGO
Internal_1	139.9M	135.6M	135.6M	3.07%	–
internal_2	802.6M	794.2M	779.5M	2.88%	1.85%
clang	123.4M	121.3M	119.2M	3.40%	1.73%
opt	105.8M	107.9M	103.7M	2.00%	3.89%

We used a vocabulary which was trained on SPEC CPU benchmarks and boost library by following the steps described here. We did not finetune this vocabulary as our goal was to make sure that (i) the embeddings add value, and (ii) the approach is scalable (in terms of compile time and memory utilization). Note that finetuning the vocabulary and model improvements might further improve performance.

Acknowledgements
All the contributors of IR2Vec - https://github.com/IITH-Compilers/IR2Vec/graphs/contributors

Thanks,
Venkat

References
[1] S. VenkataKeerthy, R Aggarwal, S Jain, M Desarkar, R Upadrasta and Y. N. Srikant. “IR2Vec: LLVM IR based Scalable Program Embeddings.” ACM Transactions on Architecture and Code Optimization (TACO) 17.4 2020. https://arxiv.org/abs/1909.06228.
[2] S Jain, Y Andaluri, S. VenkataKeerthy, R Upadrasta. “POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning”. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2022. https://arxiv.org/abs/2208.04238.
[3] S Jain, S. VenkataKeerthy, R Aggarwal, T K Dangeti, D Das, R Upadrasta. “Reinforcement Learning assisted Loop Distribution for Locality and Vectorization”. IEEE/ACM Eighth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 2022. https://ieeexplore.ieee.org/document/10026979.
[4] S. VenkataKeerthy, S Jain, A Kundu, R Aggarwal, A Cohen, and R Upadrasta. “RL4ReAl: Reinforcement Learning for Register Allocation”. Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction (CC 2023). https://doi.org/10.1145/3578360.3580273.