[LLVMdev] RFC: Binary format for instrumentation based profiling data (original) (raw)

Duncan Exon Smith dexonsmith at apple.com
Mon Mar 17 19:00:28 PDT 2014

Previous message: [LLVMdev] RFC: Binary format for instrumentation based profiling data
Next message: [LLVMdev] RFC: Binary format for instrumentation based profiling data
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mar 17, 2014, at 17:22, Justin Bogner <mail at justinbogner.com> wrote:

Chandler Carruth <chandlerc at google.com> writes: The other assumption here is that you want the same file format written by instrumentation and read back by the compiler. While I think that is an unsurprising goal, I think it creates quite a few limitations that I'd like to point out. I think it would be worthwhile to consider the alternative of having the profile library write out data files in a format which is essentially "always" transformed by a post-processing tool before being used during compilation.

Limitations of using the same format in both places: - High burden on writing the file constrains the format (must be fast, must not use libraries, etc...) - Have to write and index even though the writer doesn't really need it. - Have to have the function name passed through the instrumentation, potentially duplicating it with debug info. - Can't use an extensible file format (like bitcode) to insulate readers of profile data from format changes. I'm imagining it might be nicer to have something along the lines of the following counter proposal. Define two formats: the format written by instrumentation, and the format read by the compiler. Split the use cases up. Specialize the formats based on the use cases. It does require the user to post-process the results, but it isn't clear that this is really a burden. Historically it has been needed to merge gcov profiles from different TUs, and it is still required to merge them from multiple runs. This is an interesting idea. The counter data itself without index is dead simple, so this approach for the instrumentation written format would certainly be nice for compiler-rt, at the small cost of needing two readers. We'd also need two writers, but that appears inevitable since one needs to live in compiler-rt.

I'm in favour of two formats. Simplifying compiler-rt is a worthwhile goal.

Nevertheless, the current proposal with a naive index is straightforward to produce, especially after the changes I committed today. I think moving to that is a good incremental change.

Moving forward we can split the format in two and evolve them independently. In particular, compiler-rt's write could be coded as a few memcpy calls plus a header, if there's some freedom around the format.

I think the results could be superior for both the writer and reader:

Instrumentation written format: - No index, just header and counters - (optional) Omit function names, and use PC at a known point of the function, and rely on debug info to map back to function names. This depends a bit on whether or not the conversion tool should depend on the debug info being available. We'd need to weigh the usability cost against the size benefit. - Use a structure which can be mmap-ed directly by the instrumentation code (at least on LE systems) so that "writing the file on close" is just flushing the memory region to disk If this is feasible, we could also make the format is host endian and force the post-processing to byteswap as it reads. This avoids online work in favour of offline. - Explicitly version format, and provide no stability going forward Profile reading format: - Use a bitcoded format much like Clang's ASTs do (or some other tagged format which allows extensions) I'm not entirely convinced a bitcoded format is going to gain us much over a simpler on disk hash table. The variable bit rate integers might be worthwhile, but will it be efficient to look up the counters for a particular function name? That said, the ASTs also make use of the on disk hash that Dmitri mentioned for various indexes, which is definitely worth looking at. - Leverage the existing partial reading which has been heavily optimized for modules, LLVM IR, etc. - Use implicit-zero semantics for missing counters within a function where we have some instrumentation results, and remove all zero counters - Maybe other compression techniques Thoughts? Specific reasons to avoid this? I'm very much interested in minimizing the space and runtime overhead of instrumentation, as well as getting more advanced features in the format read by Clang itself.

LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Previous message: [LLVMdev] RFC: Binary format for instrumentation based profiling data
Next message: [LLVMdev] RFC: Binary format for instrumentation based profiling data
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list