[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types (original) (raw)

Bill Wendling via llvm-dev llvm-dev at lists.llvm.org
Thu May 28 15:42:13 PDT 2020


On Tue, May 26, 2020 at 7:49 PM James Y Knight via llvm-dev <llvm-dev at lists.llvm.org> wrote:

At least in this test-case, the "bitfield" part of this seems to be a distraction. As Eli notes, Clang has lowered the function to LLVM IR containing consistent i16 operations. Despite that being a different choice from GCC, it should still be correct and consistent. I suspect that this is more prevalent with bitfields as they're more likely to have the load / bitwise op / store operations done on them, resulting in an access type that can be shortened. But yes, it's not specific to just bitfields.

I'm more interested in consistency, to be honest. If the loads and stores for the bitfields (or other such shorten-able objects) were the same, then we wouldn't run into the store-to-load forwarding issue on x86 (I don't know about other platforms, but suspect that consistency wouldn't hurt). I liked Arthur's idea of accessing the object using the type size the bitfield was defined with (i8, i16, i256). It would help with improving the heuristic. The downside is that it could lead to un-optimal code, but that's the situation we have now, so...

-bw

Of course that insight does mean it's quite easy to create a test-case with the exact same problematic store->load mismatch which doesn't use bit-fields at all. For example: short f2(short *bfs) { *bfs &= ~0x1; g(); return *bfs; }

creates the same bad sequence: movq %rdi, %rbx andb $-2, (%rdi) callq g() movzwl (%rbx), %eax On Tue, May 26, 2020 at 9:30 PM John McCall via llvm-dev <llvm-dev at lists.llvm.org> wrote:

On 26 May 2020, at 20:31, Arthur O'Dwyer wrote: On Tue, May 26, 2020 at 7:32 PM John McCall via cfe-dev <_ _cfe-dev at lists.llvm.org> wrote: On 26 May 2020, at 18:28, Bill Wendling via llvm-dev wrote: [...] The store is a byte: orb $0x1,0x4a(%rbx) while the read is a word: movzwl 0x4a(%r12),%r15d The problem is that between the store and the load the value hasn't been retired / placed in the cache. One would expect store-to-load forwarding to kick in, but on x86 that doesn't happen because x86 requires the store to be of equal or greater size than the load. So instead the load takes the slow path, causing unacceptable slowdowns. [...] Clang used to generate narrower loads and stores for bit-fields, but a long time ago it was intentionally changed to generate wider loads and stores, IIRC by Chandler. There are some cases where I think the “new” code goes overboard, but in this case I don’t particularly have an issue with the wider loads and stores. I guess we could make a best-effort attempt to stick to the storage-unit size when the bit-fields break evenly on a boundary. But mostly I think the frontend’s responsibility ends with it generating same-size accesses in both places, and if inconsistent access sizes trigger poor performance, the backend should be more careful about intentionally changing access sizes. FWIW, when I was at Green Hills, I recall the rule being "Always use the declared type of the bitfield to govern the size of the read or write." (There was a similar rule for the meaning of volatile. I hope I'm not just getting confused between the two. Actually, since of the compilers on Godbolt, only MSVC follows this rule <https://godbolt.org/z/AqAPH>, I'm probably wrong.) That is, if the bitfield is declared int16t, then use 16-bit loads and stores for it; if it's declared int32t, then use 32-bit loads and stores. I’ve always liked MSVC’s bit-field rules as a coherent whole, but they are quite different from the standard Unix rules. On Windows, T x : 3 literally allocates an entire T in the structure, and successive bit-fields get packed into that T only if their base type is of the same size (and they haven’t exhausted the original T). So of course all accesses to that bit-field are basically of the full size of the T; there’s no overlap to be concerned with. On Unix, bit-fields will typically get packed together regardless of the base type; the base type does have some influence, but it’s target-specific and somewhat odd. I’d prefer if we degraded to a Windows-like access behavior as much as we can, but it’s not always possible because of that packing. John.


LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list