[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types (original) (raw)
John McCall via llvm-dev llvm-dev at lists.llvm.org
Fri May 29 17:59:43 PDT 2020
- Previous message: [llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
- Next message: [llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 29 May 2020, at 19:29, Bill Wendling wrote:
On Fri, May 29, 2020 at 4:00 PM Richard Smith <richard at metafoo.co.uk> wrote:
On Fri, 29 May 2020 at 11:06, John McCall via llvm-dev <llvm-dev at lists.llvm.org> wrote:
On 28 May 2020, at 18:42, Bill Wendling wrote:
On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev <llvm-dev at lists.llvm.org> wrote:
At least in this test-case, the "bitfield" part of this seems to be a distraction. As Eli notes, Clang has lowered the function to LLVM IR containing consistent i16 operations. Despite that being a different choice from GCC, it should still be correct and consistent. I suspect that this is more prevalent with bitfields as they're more likely to have the load / bitwise op / store operations done on them, resulting in an access type that can be shortened. But yes, it's not specific to just bitfields. I'm more interested in consistency, to be honest. If the loads and stores for the bitfields (or other such shorten-able objects) were the same, then we wouldn't run into the store-to-load forwarding issue on x86 (I don't know about other platforms, but suspect that consistency wouldn't hurt). I liked Arthur's idea of accessing the object using the type size the bitfield was defined with (i8, i16, i256). It would help with improving the heuristic. The downside is that it could lead to un-optimal code, but that's the situation we have now, so... Okay, but what concretely are you suggesting here? Clang IRGen is emitting accesses with consistent sizes, and LLVM is making them inconsistent. Are you just asking Clang to emit smaller accesses in the hope that LLVM won’t mess them up? I don't think this has anything to do with bit-fields or Clang's lowering. This seems to "just" be an optimizer issue (one that happens to show up for bit-field access patterns, but also affects other cases). Much-reduced testcase: unsigned short n; void set() { n |= 1; }
To be clear, I agree with you that this is not a frontend problem. I asked because the question seemed to lead towards changing the width that Clang uses for accesses.
We're a bit (heh) tied in what we can do. LLVM's IR doesn't convey that we're working with a bitfield, but as many pointed out it's not bitfield-specific (I didn't mean to imply that it was only bitfields affected by this with my initial post, that was just where it showed up). The front-end generates code that's perfectly valid for the various accesses, and changing what it generates doesn't seem like it would be worth the hassle. (E.g. using i1, i3, etc. for the bitfields (note I'm not suggesting we do this, I'm just using as an example).)
So what we're left with is trying to ensure that we don't generate sub-optimal code via the DAG combiner and other passes. Some ideas off the top of my head: 1. Avoid generating byte-sized bitwise operations, because as Richard pointed out they can lead to pessimizations. 2. Come up with a better heuristic to determine if we're going to generate inconsistent load/store accesses to the same object. 3. Transform the machine instructions before register allocation to ensure that all accesses to the same address are the same size, or at least that the stores are of equal or greater size than the loads. Option (1) may be the simplest and could potentially solve our immediate issue. Option (2) is difficult because the DAG combiner operates on a single BB at a time and doesn't have visibility into past DAGs it modified. Option (3) allows us to retain the current behavior and gives us visibility into the whole function, not just an individual BB. I kind of like option 3, but I'm open to other suggestions.
What’s the concrete suggestion for (3), that we analyze the uses of a single address and try to re-widen the access to make it use a consistent width? Widening accesses is difficult in general, unless the information is still present that it was originally wider before DAG combine.
At any rate, seems like a backend problem. :)
John.
- Previous message: [llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
- Next message: [llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]