[LLVMdev] RFC: Binary format for instrumentation based profiling data (original) (raw)

Robinson, Paul Paul_Robinson at playstation.sony.com
Mon Mar 24 10:08:43 PDT 2014


We seem to have some agreement that two formats for instrumentation based profiling is worthwhile. These are that emitted by compiler-rt in the instrumented program at runtime (format 1), and that which is consumed by clang when compiling the program with PGO (format 2).

Format 1 -------- This format should be efficient to write, since the instrumented program should run with as little overhead as possible. This also doesn't need to be stable, and we can assume the same version of LLVM that was used to instrument the program will read the counter data. As such, the file format is versioned (so we can easily reject versions we don't understand) and consists basically of a memory dump of the relevant profiling counters.

The "same version" assertion isn't completely true, at a previous job we had clients who preferred not to regenerate profile data unless they actually had to (because it was a big pain and took a long time). But as long as the versioning is based on actual format changes, not just repurposing the current LLVM version number (making the previous data unusable for no technical reason), that's okay.

As long as I'm bothering to say something, is there some way that the tools will figure out that you're trying to apply old data to new files that have changed in ways that make the old data inapplicable? Sorry if this has been brought up elsewhere and I just missed it. --paulr

Format 2 -------- This format should be efficient to read and preferably reasonably compact. We'll convert from format 1 to format 2 using llvm-profdata, and clang will use format 2 for PGO. Since the only particularly important operation in this use case is fast lookup, I propose using the on disk hash table that's currently used in clang for AST serialization/PTH/etc with a small amount of metadata in a header. The hash table implementation currently lives in include/clang/Basic and consists of a single header. Moving it to llvm and updating the clients in clang should be easy. I'll send a brief RFC separately to see if anyone's opposed to moving it.

Thoughts?


LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



More information about the llvm-dev mailing list