[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM (original) (raw)
Chandler Carruth via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 14 16:51:09 PST 2019
- Previous message: [llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
- Next message: [llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Jan 14, 2019, 15:59 Mehdi AMINI <joker.eph at gmail.com wrote:
On Mon, Jan 14, 2019 at 9:36 AM Chandler Carruth via llvm-dev <_ _llvm-dev at lists.llvm.org> wrote: While I'm very interested in the end result here, I have some questions that don't seem well answered yet around pointer subtraction...
First and foremost - how do you address correctness issues here? Because the subtraction
A - B
can escape/capture more things. Specifically, if one ofA
orB
is escaped/captured, the subtraction can be used to escape or capture the other pointer. Isn't escaping supposed to work at the "address ranges" level and not at the pointer value? I mean that ifA
orB
is escaped/captured, then any pointer that is associated to the same memory range should be considered as "escaped", and thus the subtraction does not seem to leak anything more to me.
I believe this is true for subtracting "inbounds" (to borrow the gep terminology), but just as we support non inbounds GEP, we support non imbounds subtracting. There it seems like this does escape the other global. I know that in the past I've discussed this exact case with nlewycky and he believed that to be the case, so I suspect quite a bit of LLVM is written under this model. No idea what would be the impact of changing it beyond the ability to represent code like the example I gave earlier on the thread.
-- Mehdi
So some of the conservative treatment is necessary. What is the plan to update all the analyses to remain correct? What correctness testing have you done? Second - an intrinsic seems a poor fit here given the significance of this operation. We have an instruction that covers most pointer arithmetic (
getelementptr
), and I can imagine growing pointer subtraction, but it seems like it should be an instruction if we're going to have it. Based on the above, we will need to use it very often in analysis. Regarding the instcombine, it should be very easy to keep loads and stores of pointers as pointer typed in instcombine. Likely just a missing case in the code I added/touched there. On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev <_ _llvm-dev at lists.llvm.org> wrote: Hello all, This is a proposal for reducing # of ptrtoint/inttoptr casts which are not written by programmers but rather generated by LLVM passes. Currently the majority of ptrtoint/inttoptr casts are generated by LLVM; when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018) with -O3, the output IR contains 22,771 inttoptr instructions. However, when compiling it with -O0, there are only 1048 inttoptrs, meaning that 95.4% of them are generated by LLVM passes. This trend is similar in ptrtoint instruction as well. When compiling SPEC 2017 with -O0, there are 23,208 ptrtoint instructions, but among them 22,016 (94.8%) are generated by Clang frontend to represent pointer subtraction. They aren't effectively optimized out because there are even more ptrtoints (31,721) after -O3. This is bad for performance because existence of ptrtoint makes analysis return conservative result as a pointer can be escaped through the cast. Memory accesses to a pointer came from inttoptr is assumed to possibly access anywhere, therefore it may block store-to-load forwarding, merging two same loads, etc. I believe this can be addressed by applying two patches - first one is representing pointer subtraction with a dedicated intrinsic function, llvm.psub, and second one is disabling InstCombine transformation %q = load i8*, i8** %p1 store i8* %q, i8** %p2 => %1 = bitcast i8** %p1 to i64* %q1 = load i64, i64* %1, align 8 %2 = bitcast i8** %p2 to i64* store i64 %q1, i64* %2, align 8 This transformation can introduce inttoptrs later if loads are followed ( https://godbolt.org/z/wsZ3II ). Both are discussed in https://bugs.llvm.org/showbug.cgi?id=39846 as well. After llvm.psub is used & this transformation is disabled, # of inttoptrs decreases from 22,771 to 1,565 (6.9%), and # of ptrtoints decreases from 31,721 to 7,772 (24.5%). I'll introduce llvm.psub patch first.--- Adding llvm.psub --- By defining pointer subtraction intrinsic, we can get performance gain because it gives more undefined behavior than just subtracting two ptrtoints. Patch https://reviews.llvm.org/D56598 adds llvm.psub(p1,p2) intrinsic function, which subtracts two pointers and returns the difference. Its semantic is as follows. If p1 and p2 point to different objects, and neither of them is based on a pointer casted from an integer,
llvm.psub(p1, p2)
returns poison. For example, %p = alloca %q = alloca %i = llvm.psub(p, q) ; %i is poison This allows aggressive escape analysis on pointers. Given i = llvm.psub(p1, p2), if neither of p1 and p2 is based on a pointer casted from an integer, the llvm.psub call does not make p1 or p2 escape. ( https://reviews.llvm.org/D56601 ) If either p1 or p2 is based on a pointer casted from integer, or p1 and p2 point to a same object, it returns the result of subtraction (in bytes); for example, %p = alloca %q = inttoptr %x %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %xnull
is regarded as a pointer casted from an integer because it is equivalent tointtoptr 0
. Adding llvm.psub allows LLVM to utilize significant portion of ptrtoints & reduce a portion of inttoptrs. After llvm.psub is used, when SPECrate 2017 is compiled with -O3, # of inttoptr decreases to ~13,500 (59%) and # of ptrtoint decreases to ~14,300 (45%). To see the performance change, I ran SPECrate 2017 (thread # = 1) with three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM 6.0 official, and r348082 (Dec 2, 2018). Running r313797 shows that 505.mcfr has consistent 2.0% speedup over 3 different machines (which are i3-6100, i5-6600, i7-7700). For LLVM 6.0 and r348082, there's neither consistent speedup nor slowdown, but the average speedup is near 0. I believe there's still a room of improvement because there are passes which are not aware of llvm.psub. Thank you for reading this, and any comment is welcome. Best Regards, Juneyoung Lee
LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/9db5a6c4/attachment.html>
- Previous message: [llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
- Next message: [llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]