[llvm-dev] [RFC] Polly Status and Integration (original) (raw)
Tobias Grosser via llvm-dev llvm-dev at lists.llvm.org
Mon Sep 25 23🔞48 PDT 2017
- Previous message: [llvm-dev] [RFC] Polly Status and Integration
- Next message: [llvm-dev] [RFC] Polly Status and Integration
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Sep 20, 2017, at 02:33, Johannes Doerfert wrote:
Hi Hal, Tobias, Michael, and others,
Hi Johannes,
thanks for joining the discussion.
I'd like to add my view (and a proposal) to this discussion and I apologize directly for doing this so late*. I also want to apologize because this email is long, contains various technical details and also argumentations that might need more justification. However, I am happy to provide further information (and/or examples) to explain my views if required.
tldr; We should introduce polyhedral analysis and optimization capabilities into LLVM step by step. Along the way we should revisit design decisions made by Polly and adjust them according to the use cases in the rest of LLVM. Finally, we should keep Polly as a stand-alone tool to allow research into, and prototyping of, complex loop transformations.
I very much agree with you that many design decisions will be changed in the future. Many of the targets you mentioned are indeed goals that have been discussed as part of the Polly development agenda and are areas I very much would like to push forward. You also make a very valid point that we should involve the wider LLVM community in these discussions and at best in the overall development of polyhedral optimizations as a whole. Hence, it seems clear that we want to keep as much of the polyhedral development as possible in core-LLVM. I have always been conservative in pushing prototype stuff into core-LLVM, which is why we spent multiple years to develop very solid foundations in both Polly and isl. While I agree that many design decisions need to be re-evaluated, as we constantly do in LLVM, and especially in the light of advances we pushed into isl, I think doing this jointly with the full LLVM community is the way to go.
I also don't see your ideas to conflict with the integration of Polly. We can always move Polly now and replace parts of it with new and better stuff as it becomes available. In fact, Polly can serve as test-bed to provide test coverage for such new code and me and the other PollyLabs people are certainly happy to help.
* There was a paper deadline end of last week and I simply had to prioritize.
Good luck!
Polly is a deep pipeline of passes that were developed for the sole purpose of applying scheduling transformations at the very end. This "main purpose" of Polly makes it hard to effectively reuse intermediate parts, e.g. the dependence analysis or the code generation framework, at least to the degree we might want to. In a different thread [1] Hal asked for the possibility to reuse Polly for tuning via pragmas e.g., unrolling/interleaving/tiling/interchange. While this is interesting and generally possible it won't take long before the restrictions Polly places on the input code (to enable scheduling optimizations in the end) become a serious problem. The point here is that Polly has too many features and dependences for such an "easy" use case. (Note that I do not question the legality of a user requested transformation here.) If you are not looking for "end-to-end polyhedral loop optimizations" you want a system that answers a very specific question (e.g., isVectorizable) or performs a very specific transformation (e.g, tiling) on a maximal set of programs. In contrast, Polly is (and I strongly believe it should be) developed as a system that performs complex optimizations, based on specialized analysis results, on a constraint type of programs.
While it is right that Polly has been developed as end-to-end polyhedral optimizer, it certainly has not been developed only with end-to-end loop optimizations in mind. In fact, many of the foundations that are today useful to build a polyhedral based range analysis on top of isl have been developed together with Polly and with Polly as use case (e.g., to build the optimistic assumption model we use) to enable a wide range of optimizations and analyses. Seing more of them popping up now is indeed very nice.
One might argue that Polly will become more modular when it is integrated into LLVM and when the intermediate results are used in more and more places. However, I'd say this is the "wrong direction" with regards to: 1) incremental design and development, 2) maintainability of individual parts, and 3) the development of a suitable cost model (which Polly does not ship).
Polly has been developed from the very first patch in the open, was discussed on the LLVM mailing lists from day one (and started from such a discussion), and quickly became publicly visible LLVM project. I would almost claim it is the perfect example of incremental development. When we started due to license reasons and also due to a lack of mature foundations, Polly was not part of core LLVM. Over the last years we replaced and wrote new versions of all essential components for a move to core LLVM to become possible.
Still, I get your point. It makes certainly sense to e.g. extract out components of Polly to build a polyhedral range analysis and provide it to other developers. Thanks for already providing a patch that illustrates your ideas! Siddharth already commented and I would also like to have a look. I also feel that Mehdi would be an excellent person to give you feedback on this, as he has significant expertise with PIPS, which uses abstract interpretation very similar to how one would use it for a more general polyhedral range analysis.
However, I would decouple these two goals. We could move Polly and isl in place, and then move parts of Polly to the range analysis you propose -- or start copying / and redeveloping parts of Polly onto the new foundations you propose.
Later we can replace more parts. In general, there has never been an issue with temporarily having some redundancy in LLVM and I don't think as part of this discussion we need to lay out the full future of polyhedral analysis and transformations in LLVM. From my perspective, we should answer mainly the question:
"Do we want polyhedral analysis as part of core LLVM?"
"Do we want to keep significant parts of the polyhedral development outside of core LLVM?"
My feeling is that there are several groups interested in 1), which gives me confidence that we will -- eventually -- have polyhedral stuff in LLVM. Hal gave pretty good arguments for 2) and I also believe strongly that we should break the boundaries between polyhedral and core LLVM people.
Let me give two examples to make my point:
-- Example 1 -- In addition to the applicability issues there are other problems that arise from the full pipeline design. Let's assume we want to apply a pragma driven optimization to a loop that can be represented (and optimized) by Polly. Depending on the pragma we only need to perform a very specific task, e.g., adjust the iteration variable (loop inversion) or introduce new loops and adjust existing iteration variables (tiling). Polly will however represent (and actually analyze) the whole loop nest including the exact control flow and all accesses. In addition it will compute alias checks, constraints under which the representation is not truthful (e.g., an access function might overflow) and a lot of other things that are not needed for the task at hand.
Sure. Incrementally building dependences makes in certain cases sense!
-- Example 2 -- We can also imagine we want to determine if a loop can be vectorized, thus if there are (short) loop carried dependences. No matter which technique is used, the complexity of a dependence analysis will certainly increase with the number (and complexity) of the analyzed accesses. Since Polly is designed with fine-grained scheduling optimizations in mind, the dependences it will compute are disproportionate with regards to the question asked. Consequently, it will either take more time to determine if there are loop carried dependences or a time-out in the computation will cause a conservatively negative result.
Good idea!
While it is generally possible to disable some "features" of Polly or to "work around them", it requires new independent code paths in an already highly interleaved project. The explicit (and implicit) interactions between the different parts of Polly are plentiful and subtle. I don't think there are 5 people that know about more than half of them. (I also doubt there is no one that knows them all.)
I want to propose an alternative plan for which I would appreciate feedback. The feedback is especially useful since I will present this, or more accurately the concrete polyhedral analyses described below, in the student research competition at LLVM Dev meeting in October.
Nice. Looking forward to your talk!
The idea is simple. We rewrite parts of Polly with two goals in mind: 1) to provide analysis results and transformation options to LLVM, and 2) allow polyhedral loop transformation.
Sure. Improved granularity is certainly a good thing to have. As stated in the initial proposal, I am glad to see Polly being ripped apart. You already started!
A rewrite allows to revisit various design decisions that are buried deep in the Polly code and that hinder its application (in any context). The interested reader can find a short list at the end. I think, we should incrementally introduce polyhedral analysis capabilities in order to provide immediate benefits and to allow developers to get involved with the design and usage from an early point in time. Decoupled analyses also allow easier integration, interfaces that come "more natural" to non-polyhedral people, and better use of the capabilities if polyhedral scheduling is not the goal.
Many of these ideas sound very nice. I am certain you will provide some good interface ideas.
To demonstrate this incremental process I started to write a polyhedral value analysis**. Think of it as Scalar Evolution but with piece-wise defined polyhedral values as the abstract domain. This analysis is demand driven, works on the granularity of a LLVM value, is flow sensitive, and especially designed to be applicable on partially affine programs. It already supports most of what the polyhedral model does allow and includes a lot of things we prototyped in Polly. It shall be used when Scalar Evolution is not able to provide a sufficient result.
Nice! Clearly a way in the right direction.
On top I implemented a polyhedral memory analysis*** that is (currently) capable of summarizing the memory effects of a code region, e.g., a function. The next step is a demand-driven dependence analysis**** that utilizes the (often) very structured nature of a CFG. Afterwards, I would like to see a transformation framework that allows classical but also scheduling based transformations.
While these analyses still depend on isl as a back-end, they never directly use it. Instead there is a C++ interface in-between that encapsulates the features needed. Once our use cases are clear we can re-implement this interface.
Interesting. This seems to complement our efforts in providing a C++ interface to isl, but with slightly different goals. I feel as if we still should use isl++ instead of plain isl within your interface.
These are a few design decisions that I think should be revisited when we think about polyhedral-based analysis and optimizations in LLVM:
- We should not rely on Scalar Evolution. Instead we want to build polyhedral values straight from the underlying IR. While Scalar Evolution did certainly simplify Polly's development it induced various problems: - The lack of piece-wise affine values (other than min/max) limits applicability and precision. (Though conditional SCEVs are a small step in this direction.) - The flow insensitive value analysis (Scalar Evolution) causes a lot of "assumptions", thus versioning, for the otherwise flow sensitive Polly. - There are caching issues when parts of the program are transformed or when Polly is used for inter-procedural analysis/optimization.
I agree, this is certainly a good direction.
- We should not depend on single-entry-single-exit (SESE) regions. Instead we analyze the code optimistically to the extend possible (or required by the user). Afterwards we can easily specialize the results to the code regions we are interested in. While SESE regions always caused compile time issues (due to the use of the post dominance tree) they also limit the analysis/transformations that can be performed. As an example think of accesses that are non-affine with regards to the smallest surrounding SESE region but not in a smaller code region (e.g., a conditional). If the goal is not loop scheduling but, for example, instruction reordering, it is still valuable to determine the dependences between these accesses in the smaller code region.
SESE regions have (to my observation) never caused compile time issues. However, several people including Chandler have raised concerns several times and they indeed do not fit well when e.g. mixed with the vectorizer. Hence, I agree that our ultimate polyhedral optimizer in LLVM should likely not use SESE regions. In practice, Polly works reasonably pretty well with regions for now.
- We should iteratively and selective construct polyhedral representations. The same holds for analysis results, e.g. dependences. Let's reconsider example 2 from above. We want to visit one memory access at a time in order to build the polyhedral representation and compute the new dependences. We also want to immediately stop if we identify a loop-carried dependence. Finally, we do not want to compute dependences in outer loop iterations or dependences that are completely contained in one loop iteration.
Interesting. For certain use cases this is clearly the right approach. For full-fledged scheduling this may not work.
- We should encapsulate the code generation part completely from the transformation part. Additionally we might want to start with classical loop transformations instead of full polyhedral scheduling. For a lot of classical loop transformations code generation only needs to understand/change loop induction variables and exit conditions. Duplication + rewriting of the loop body is often not necessary. Pragma driven optimizations could be easily done and we could also use heuristics similar to the ones we have today (e.g., for loop distribution). The transformations we allow should than grow with the scheduling decisions we allow and these should be limited by a yet to be determined cost model.
Sure, better encapsulation is always good. How to do this exactly will be an interesting discussion!
I am aware that not all Polly developers will agree with my views and that some are simply trade-offs that do not offer a "correct way". Nevertheless, I hope that the simple examples I used to point out various problems are enough to consider the route I proposed.
I briefly looked at your patch for the polyhedral range analysis. While I would have preferred an initial design discussion much earlier, and your patch needs more comments and especially a design description, it certainly is a move in the right direction! We may also split it in smaller pieces to make review easier. In fact, for some reason I think here at the RISC-V upstreaming. Not that we should go at it's speed, but it has been a perfect example of how to upstream a new backend -- with a lot of feedback from the community. This seems to be very much aligned with your goals. Overall, I clearly agree with your proposal of a polyhedral range analysis and will be glad to provide more review comments.
Your development plans for the future are overall very much aligned with what I would like to see for polyhedral transformations in LLVM. We already discussed some of them over the last years, but you clearly also bring some nice "revolutionary" ideas into the game. I am really excited to see how these will evolve! I am also glad to see you again more active in the mailing list!
Best, Tobias
- Previous message: [llvm-dev] [RFC] Polly Status and Integration
- Next message: [llvm-dev] [RFC] Polly Status and Integration
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]