[LLVMdev] RFC: PerfGuide for frontend authors (original) (raw)

Philip Reames listmail at philipreames.com
Sat Feb 28 14:23:02 PST 2015

Previous message: [LLVMdev] RFC: PerfGuide for frontend authors
Next message: [LLVMdev] RFC: PerfGuide for frontend authors
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 02/28/2015 10:04 AM, Björn Steinbrink wrote:

Hi,

On 2015.02.28 10:53:35 -0600, Hal Finkel wrote: ----- Original Message -----

From: "Philip Reames" <listmail at philipreames.com>

6. Use the lifetime.start/lifetime.end and invariant.start/invariant.end intrinsics where possible Do you find these help in practice? The few experiments I ran were neutral at best and harmful in one or two cases. Do you have suggestions on how and when to use them? Good point, we should be more specific here. My, admittedly limited, experience with these is that they're most useful when their properties are not dynamic -- which perhaps means that they post-dominate the entry, and are applied to allocas in the entry block -- and the larger the objects in question, the more the potential stack-space savings, etc. my experience adding support for the lifetime intrinsics to the rust compiler is largely positive (because our code is very stack heavy at the moment), but we still suffer from missed memcpy optimizations. That happens because I made the lifetime regions as small as possible, and sometimes an alloca starts its lifetime too late for the optimization to happen. My new (but not yet implemented) approach to to "align" the calls to lifetime.start for allocas with overlapping lifetimes unless there's actually a possibility for stack slot sharing. For example we currently translate: let a = [0; 1000000]; // Array of 1000000 zeros { let b = a; } let c = something; to roughly this: lifetime.start(a) memset(a, 0, 1000000) lifetime.start(b) memcpy(b, a) lifetime.end(b) lifetime.start(c) lifetime.end(c) lifetime.end(a) The lifetime.start call for "b" stops the call-slot (I think) optimization from being applied. So instead this should be translated to something like: lifetime.start(a) lifetime.start(b) memset(a, 0, 1000000) memcpy(b, a) lifetime.end(b) lifetime.start(c) lifetime.end(c) lifetime.end(a) extending the lifetime of "b" because it overlaps with that of "a" anyway. The lifetime of "c" still starts after the end of "b"'s lifetime because there's actually a possibility for stack slot sharing. Björn I'd be interested in seeing the IR for this that you're currently generating. Unless I'm misreading your example, everything in this is completely dead. We should be able to reduce this to nothing and if we can't, it's clearly a missed optimization. I'm particularly interested in how the difference in placement of the lifetime start for 'b' effects optimization. I really wouldn't expect that.

Philip

Previous message: [LLVMdev] RFC: PerfGuide for frontend authors
Next message: [LLVMdev] RFC: PerfGuide for frontend authors
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list