[llvm-dev] RFC: Representing unions in TBAA (original) (raw)
Krzysztof Parzyszek via llvm-dev llvm-dev at lists.llvm.org
Fri Apr 7 12:09:02 PDT 2017
- Previous message: [llvm-dev] Phabricator stopped sending email to llvm-commits
- Next message: [llvm-dev] RFC: Representing unions in TBAA
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Can we turn off TBAA info for union member accesses in clang before this gets fixed?
-Krzysztof
On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: > So, https://bugs.llvm.org/showbug.cgi?id=32056 is an example showing > our current TBAA tree for union generation is definitely irretrievably > broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation > we know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. >> Given > union {int a, short b}; >> GCC's will be: >> union > _/ _ > short int >>> Everything is implicitly a subset of alias set 0 (which for C/C++ is > char) just to avoid representing it. >> Our metadata has no child links, and only a single parent link. >> You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). >> Your suggestion is to invert this as a struct >> short int > | / > union >> We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow >> Let me suggest something different: >> We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I > suggest: we stop messing around and just have a GCC style tree, and > represent the children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` >> We could do that without changing to children, but honestly, i strongly > believe nobody who ever looks at this really understands it right now. > (If someone really does like the ancestor form, i'd love to understand > why :P) >> So if we are going to change it, we should change it to something that > is easier to understand. >> something like: >> scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } >> Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. >> unions are just scalar type nodes with multiple children, not struct > nodes with special-cased offset zero. >> This also has a well-defined upgrade path. >> For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply disable > it for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. >> In this representation, the most generic tbaa node is just the one that > contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). >> (The above also may be better for other languages) >> --Dan >>>>> _On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com_ > <mailto:perrons at ca.ibm.com>> wrote: >> Seems like the comments have stopped. I'll try to get a patch > together. Then we can continue the discussion from there. >> Hubert, as for the issue with the llvm optimizations losing the TBAA > information, it should be the responsibility to make sure the > aliasing is changed in the correct way. One function related to > this has already been mentioned: getMostGenericTBAA. >> Later, > Steven Perron >>>> _From: Hubert Tong <hubert.reinterpretcast at gmail.com_ > <mailto:hubert.reinterpretcast at gmail.com>> > To: Steven Perron/Toronto/IBM at IBMCA > _Cc: Daniel Berlin <dberlin at dberlin.org_ > _<mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org_ > <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das > <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpointers.com>> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > ------------------------------------------------------------------------ >>>> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron > <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>> wrote: > 3) How should we handle a reference directly through a union, and a > reference that is not through the union? >> My solution was to look for each member of the union overlaps the > given offset, and see if any of those members aliased the other > reference. If no member aliases the other reference, then the > answer is no alias. Otherwise the answer is may alias. The run > time for this would be proportional to "distance to the root" * > "number of overlapping members". This could be slow if there are > unions with many members or many unions of unions. >> Another option is to say that they do not alias. This would mean > that all references to unions must be explicitly through the union. > From what I gather from the thread so far, the access through the > union may be lost because of LLVM transformations. I am not sure > how, in the face of that, TBAA could indicate NoAlias safely > (without the risk of functional-correctness issues in correct > programs) between types which overlap within any union (within some > portion of the program). >> As for the standards, it is definitely not true that all references > to unions must be explicitly through the union. However, if you are > trying to perform union-based type punning (under C11), then it > appears that it is intended that the read must be through the union. >> This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we > make the parent of the TBAA node for the union to be "omnipotent > char". This might be similar to treating the union TBAA node more > like a scalar node instead of a struct-path. Then the traversal of > the TBAA nodes will be quick. I'll have to work this out a bit > more, but, if this is good enough to meet the requirements of the > standard, I can try to think this through a little more. I'll need > Hubert and Daniel to comment on that since I am no expert on the C > and C++ standards. >> The third option is to be pessimistic and say "may alias" all of the > time (conservatively correct), and rely on other alias analysis to > improve it. This will have good compile time, but could hinder > optimization. Personally I do not like this option. Most of the > time it will not have a negative effect, but there will be a > reasonable number of programs where this will hurt optimization more > that it needs to. >>>>>>> _________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
- Previous message: [llvm-dev] Phabricator stopped sending email to llvm-commits
- Next message: [llvm-dev] RFC: Representing unions in TBAA
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]