(original) (raw)

Hi Andrea,

I’ve updated the patch here:

https://reviews.llvm.org/D54603/new/

I have not modified the description in the Phabricator. Of course, if this solution is the way-forward, then I will be happy to update the change description.

I also modified the format of the symbol names (from what I previously describe in my last response below), to save room: “.mca\_code\_region..”. Where “id” is the user’s specified identifier they want associated to the region (cosmetic), and “number” is a module unique number for the region. I have tested this with non-executable objects, executable files, and multiple objects linking into a single executable.

Also, I am trimming this thread up (sorry, but it’s starting to get unwieldy and I think the mailing list bot is grumpy).

-Matt

From: Andrea Di Biagio
Sent: Monday, December 17, 2018 10:12 AM
To: Davis, Matthew
Cc: Simon Pilgrim ; llvm-dev ; cfe-dev@lists.llvm.org
Subject: Re: \[llvm-dev\] \[RFC\]\[llvm-mca\] Adding binary support to llvm-mca.

Hi Matt,

could you please update the associated patch too?

Thanks

-Andrea

On Mon, Dec 17, 2018 at 5:48 PM via llvm-dev <llvm-dev@lists.llvm.org> wrote:

Adding the llvm-dev list, because my email client decides to remove certain lists when I reply-all... including the list that I intend to respond to.

\> -----Original Message-----
\> From: Davis, Matthew <Matthew.Davis@sony.com>
\> Sent: Monday, December 17, 2018 9:47 AM
\> To: Davis, Matthew <Matthew.Davis@sony.com>; llvm-dev@redking.me.uk
\> Cc: cfe-dev@lists.llvm.org
\> Subject: RE: \[llvm-dev\] \[RFC\]\[llvm-mca\] Adding binary support to llvm-mca.
\>
\> Just an update to this RFC. (I thought this was going to be a short email... my
\> apologies).
\>
\> One of the primary limitations described in this RFC (earlier in this thread), is that the
\> patch for this RFC only handles linked executables. This restriction is due to myself
\> wanting to avoid handling relocations in a target specific manner, at least in the
\> initial patch. Especially so, since I want to keep the initial patch simple. However,
\> sometimes simple and practical are at odds with each other. I envision the main use-
\> case of llvm-mca, with binary support, is for analyzing .o files. Probably analyzing .o
\> more commonly than fully linked executables. Without handling relocated objects,
\> this patch seems rather useless.
\>
\> I started exploring an alternative solution this weekend to the aforementioned
\> problem. This alternative solution avoids having to handle relocations, but does give
\> us support for object files (with relocated symbols) and linked executables. The
\> change is quite simple, and seems to be effective. In short, we still generate
\> intrinsics as discussed in the RFC, one to mark the start of a code region, and
\> another to mark the end. These intrinsics get lowered into local symbols. The
\> symbols are already encoded with address information about their position in the
\> object file. What is different is that we ensure that these symbols have unique
\> names and also encode the user provided ID value.
\>
\> Previously the labels were named like: .Lmca\_code\_region\_start\_, and
\> similar for .Lmca\_code\_region\_end. The user id number and region size were
\> encoded in the .mca\_code\_regions object file section. Previously mca never looked
\> at the symbol table. But, In reality we can calculate the region size by using the
\> symbols in the symbol table (look for the mca symbols), instead of relying on the
\> information encoded in .mca\_code\_regions. The alternative approach gets rid of
\> that section entirely but achieves the same functionality by encoding the information
\> in the symbol name. In short the alternative approach just parses the symbol table
\> for MCA symbols, and the symbol names are encoded with the data we need.
\>
\> The newly proposed name is formatted
\> as: .Lmca\_code\_region\_start.... Similar for
\> mca\_code\_region\_end. 'function' is the function that the marker appears in, 'ID' is
\> the user-specified ID (this is a value that users specify for easily identifying the code
\> region under analysis... just cosmetic), and 'number' is a unique number to avoid any
\> duplicate name conflicts. The benefit of this alternative solution is that we can get
\> rid of .mca\_code\_regions, and gather all of the information llvm-mca needs by
\> parsing the symbol table looking for any symbols with the 'mca\_code\_region\_start'
\> and 'mca\_code\_region\_end' format discussed above. Of course, if the string table is
\> stripped, then we will lose this data. The main drawback from this alternative
\> approach is that it relies on encoding symbol names and string processing on those
\> names. I'm somewhat biased against doing string parsing, but the code to perform
\> this is simple and small, and more importantly it allows llvm-mca to handle linked or
\> relocated object files.
\>
\> -Matt
\>
\>
\>
\> > -----Original Message-----
\> > From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of via llvm-dev
\> > Sent: Tuesday, December 11, 2018 11:23 AM
\> > To: llvm-dev@redking.me.uk; llvm-dev@lists.llvm.org
\> > Cc: cfe-dev@lists.llvm.org
\> > Subject: Re: \[llvm-dev\] \[RFC\]\[llvm-mca\] Adding binary support to llvm-mca.
\> >
\> > Thanks for the response Simon. My reply is inline:
\> >
\> > > From: Simon Pilgrim <llvm-dev@redking.me.uk>
\> > > Sent: Monday, December 10, 2018 1:40 PM
\> > >
\> > > Hi Matt,
\> > >
\> > > I can see a near future where perf-analysis tooling uses branch history
\> > > profiler captures to determine how often loops/branches are taken and
\> > > feeds that into llvm-mca, especially for hot/branchy loop analysis
\> > > reports etc. Are you confident that your approach will be easily
\> > > extendable for this?
\> >
\> > That is a very interesting use case. The restriction of a code-region to a single
\> block
\> > is a limitation for any tools that want to analyze branches. However, I believe
\> > that it will be easy to lift this restriction (it's just a check in IR/Verifier). This
\> > limitation is not
\> > expressed in the llvm-mca driver.
\> >
\> > If the information is coming from a profile report, then we'd most
\> > likely need to extend the llvm-mca driver to accept profile reports. Currently,
\> > code regions, from the perspective of the llvm-mca driver, are very simple. They
\> are
\> > just a collection of MCInst. The binary support in this RFC+patch
\> > disassembles just the address range from marker start address for
\> > some specified number of bytes. It might be useful to add another driver argument
\> > so that a user (or tool) can specify, from the command line, a range of instructions
\> > to
\> > analyze. I recently added a class for handling inputs to
\> > llvm::mca::CodeRegionGenerator,
\> > which is just responsible for taking some input and creating a list of MCInst that
\> llvm-
\> > mca uses.
\> > We could subclass this to handle profile reports.
\> >
\> > > Similarly, being able to generally embed the profile markers in object
\> > > libraries for reuse is going to be important for some people - I'd like
\> > > to see more of a plan of how this will be achieved. I understand that it
\> > > might not be easy for some exe formats.
\> >
\> > That is definitely a limitation. This initial patch+RFC only handles linked
\> > executables (i.e., the llvm-mca marker symbol addresses are resolved).
\> > I'm working on a better solution so that this will not be a restriction.
\> > In fact, I'll probably delay trying to land any patches until I solve relocations
\> > (or use a different solution for identifying start/end addresses for llvm-mca code
\> > regions).
\> >
\> > > Sorry if I'm being too critical, but I'm a bit worried that we end up
\> > > with an initial implementation that will take a lot of reworking to meet
\> > > our final aims.
\> > >
\> > > Thanks, Simon.
\> >
\> > I understand your criticisms and value your input. Thanks a ton!
\> >
\> > -Matt,