[llvm-dev] [RFC] Adding CPS call support (original) (raw)
Kavon Farvardin via llvm-dev llvm-dev at lists.llvm.org
Wed Apr 19 04:22:14 PDT 2017
- Previous message: [llvm-dev] [RFC] Adding CPS call support
- Next message: [llvm-dev] [RFC] Adding CPS call support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
What do you think is more important for GHC: getting good or highly tunable low-level code, or enabling mid-level optimizations such as GVN across calls? Is GHC using LLVM more as an optimizer or as a code generator?
GHC currently uses LLVM more as a code generator; its an alternative backend.
While some optimizations may not be effective on the proposed construct, we already produce rather horrible looking code just to use LLVM. Our function splitting transformation that turns it into this mess is complicated to maintain, so we'd like to remove it.
I see at least one performance benefit of not breaking the program into hundreds of tiny functions: better code placement for I-cache locality.
~kavon
On Apr 18, 2017, at 11:00 PM, Reid Kleckner <rnk at google.com> wrote:
This seems to be solving a problem very similar to C++ coroutines. You might find it helpful to read the past discussion of their design in LLVM. What do you think is more important for GHC: getting good or highly tunable low-level code, or enabling mid-level optimizations such as GVN across calls? Is GHC using LLVM more as an optimizer or as a code generator? If GHC's code generation pipeline leaves opportunities for LLVM mid-level optimization, then I think you want to pursue a design similar to C++ coroutines, where we leave the control flow graph "uninverted" until code generation. I might be making up terminology here, but I think of transforming a program representation to continuation-passing style as "inverting" the CFG. I'm not suggesting that you use the native stack to store VM data, I'm just suggesting that you hide the fact that CPS calls will really be tail calls and returns from LLVM until code generation. On Mon, Apr 17, 2017 at 8:30 AM, Kavon Farvardin via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: Summary ======= There is a need for dedicated continuation-passing style (CPS) calls in LLVM to support functional languages. Herein I describe the problem and propose a solution. Feedback and/or tips are greatly appreciated, as our goal is to implement these changes so they can be merged into LLVM trunk.
Problem ======= Implementations of functional languages like Haskell and ML (e.g., GHC and Manticore) use a continuation-passing style (CPS) transformation in order to manage the call stack explicitly. This is done prior to generating LLVM IR, so the implicit call stack within LLVM is not used for call and return. When making a non-tail call while in CPS, we initialize a stack frame for the return through our own stack pointer, and then pass that stack pointer to the callee when we jump to it. It is here when we run into a problem in LLVM. Consider the following CPS call to @bar and how it will return: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; define void @foo (i8** %sp, ...) { someBlk: ; ... ; finish stack frame by writing return address %retAddr = blockaddress(@foo, %retpt) store i8* %retAddr, i8** %sp ; jump to @bar tail call void @bar(i8** %sp, ...) retpt: ; <- how can @bar "call" %retpt?_ _%sp2 = ???_ _%val = ???_ _; ..._ _}_ _define void @bar (i8** %sp, ...) {_ _; perform a return_ _%retAddr0 = load i8*, i8** %sp_ _%retAddr1 = bitcast i8* %retAddr0 to void (i8**, i64)*_ _%val = bitcast i64 1 to i64_ _; jump back to %retpt in @foo, passing %sp and %val_ _tail call void %retAddr1(i8** %sp, i64 %val)_ _}_ _;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;_ _There is currently no way to jump back to %retpt from another function, as block_ _addresses have restricted usage in LLVM [1]. Our main difficulty is that we_ _cannot jump to a block address without knowing its calling convention, i.e., the_ _particular machine registers (or memory locations) that the block expects_ _incoming values to be passed in._ _The workaround we have been using in GHC for LLVM is to break apart every_ _function, placing the code for the continuation of each call into a new_ _function. We do this only so that we can store a function pointer instead of a_ _block address to our stack. This particularly gross transformation inhibits_ _optimizations in both GHC and LLVM, and we would like to remove the need for it._ _Proposal_ _========_ _I believe the lowest-impact method of fixing this problem with LLVM is the_ _following:_ _First, we add a special 'cps' call instruction marker to be used on non-tail_ _calls. Then, we use a specialized calling convention for these non-tail calls,_ _which fix the returned values to specific locations in the machine code [2]._ _To help illustrate what's going on, let's rewrite the above example using the_ _proposed 'cps' call:_ _;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;_ _define { ... } @foo (i8** %sp, ...) {_ _someBlk:_ _; ..._ _; finish stack frame by writing return address_ _%retAddr = blockaddress(@foo, %retpt)_ _store i8* %retAddr, i8** %sp_ _; jump to @bar_ _%retVals = cps call ghccc {i8**, i64} @bar (i8** %sp, ...)_ _br label %retpt_ _retpt:_ _%sp2 = extractvalue {i8**, i64} %retVals, 0_ _%val = extractvalue {i8**, i64} %retVals, 1_ _; ..._ _}_ _define {i8**, i64} @bar (i8** %sp, ...) {_ _; perform a return_ _%retAddr0 = load i8*, i8** %sp_ _%retAddr1 = bitcast i8* %retAddr0 to {i8**, i64} (i8**, i64)*_ _%val = bitcast i64 1 to i64_ _; jump back to %retpt in @foo, passing %sp and %val_ _tail call ghccc void %retAddr1(i8** %sp, i64 %val)_ _unreachable ; <- ideally this would be our terminator,_ _; but 'unreachable' breaks TCO, so we will_ _; emit a ret of the struct "returned" by the call._ _}_ _;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;_ _The important point here is that the 'cps' marked call will lower to a jump. The_ _'cps' call marker means that the callee knows how to return using the arguments_ _explicitly passed to it, i.e., the stack pointer %sp. The callee cannot use a_ _'ret' instruction if it is 'cps' called._ _Either before or during 'cps' call lowering, any instructions following the_ _'cps' call to @bar are sunk into the the block %retpt, and the unconditional_ _branch to %retpt is deleted/ignored. We include that branch to preserve_ _control-flow information for LLVM IR optimization passes._ _The 'extractvalue' instructions are what ensure the calling convention of_ _%retpt, since the fields of the struct %retVals are returned in physical_ _registers dictated by the (modified) ghccc convention. Those same physical_ _registers are used by the ghccc tail call in @bar when it jumps back to %retpt._ _So, the call & return convention of ghccc ensures that everything matches up._ _Interaction with LLVM_ _=====================_ _(1) Caller-saved Values_ _One may wonder how this would work if there are caller-saved values of the 'cps'_ _call. But, in our first example, which closely matches what CPS code looks like,_ _the call to @bar was in tail position. Thus, in the second example, there are no_ _caller-saved values for the 'cps' call to @bar, as all live values were passed_ _as arguments in the call._ _This caller-saved part is a bit subtle. It works fine in my experience [2] when_ _@bar is a function not visible to LLVM. My impression is that even if @bar is_ _visible to LLVM, there is still no issue, but if you can think of any corner_ _cases that would be great!_ _(2) Inlining_ _My gut feeling is that we cannot inline a 'cps' marked call-site without more_ _effort. This is because we might end up with something odd like this once the_ _dust settles:_ _%retAddr = blockaddress(@foo, %retpt)_ _%retAddr1 = bitcast i8* %retAddr to {i8**, i64} (i8**, i64)*_ _tail call ghccc %retAddr1 ( %sp, ... )_ _We could add a pass that turns the above sequence into just an unconditional_ _branch to %retpt, using a phi-node to replace each 'extractvalue' instruction in_ _that block._ _I'm not sure whether inlining in LLVM is important for us yet, as we tend to_ _inline quite a lot before generating LLVM IR. I don't think this additional fix-_ _up pass would be too difficult to implement if it's desired._ _Implementation Sketch and Conclusion_ _====================================_ _My current plan is to add this special lowering of 'cps' calls during the_ _translation from LLVM IR to the SelectionDAG. I welcome any suggestions or tips_ _on the best way to approach this. An important goal for us is to merge this into_ _trunk since we do not want to bundle a special version of LLVM with GHC._ _Please let me know soon if you have any objections to this feature._ _Thanks for reading,_ _Kavon_ _References_ _==========_ _[1] http://llvm.org/docs/LangRef.html#blockaddress <http://llvm.org/docs/LangRef.html#blockaddress> [2] http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf <http://kavon.farvard.in/papers/ml16-cwc-llvm.pdf>
LLVM Developers mailing list llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170419/f557071f/attachment.html>
- Previous message: [llvm-dev] [RFC] Adding CPS call support
- Next message: [llvm-dev] [RFC] Adding CPS call support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]