[LLVMdev] debugloc metadata variation (original) (raw)
Duncan P. N. Exon Smith dexonsmith at apple.com
Thu Oct 23 09:43:00 PDT 2014
- Previous message: [LLVMdev] debugloc metadata variation
- Next message: [LLVMdev] debugloc metadata variation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2014-Oct-23, at 09:19, David Blaikie <dblaikie at gmail.com> wrote:
(sorry for the duplicate Fred, I failed at reply-all the first time) On Wed, Oct 22, 2014 at 6:33 PM, Frédéric Riss <friss at apple.com> wrote: > On Oct 22, 2014, at 4:57 PM, David Blaikie <dblaikie at gmail.com> wrote: > > Just working on some of the gmlt+fission debug info stuff and I came across a comment that might be relevant to reducing the number of distinct debugloc metadata nodes: > > "or some sub-optimal metadata that > // isn't structurally identical (see: file path/name info from clang, which > // includes the directory of the cpp file being built, even when the file name > // is absolute (such as an <> lookup header)))" > > Seems that the file path/name isn't well canonicalized so as to allow metadata level merging when linking. Might be helpful to figure out that issue at some point. Incidentally I worked on an issue last week where the line table would get entries representing the same file, but where the file/dir split wasn’t done at the same place. I have a patch that remerges them at emission, but I was planing on investigating more the source of the duplication before I submit anything. The cases I’ve seen have one duplicated entry though, nothing that could have a visible impact on memory consumption. So the particular case where I think this arises in a way that might be measurable is if you have a build system that changes directories to build subprojects (like our make build system, if I understand correctly - but not our cmake build system, again, if I understand correctly): imagine a simple directory layout: include/ foo.h lib/ a/ a.cpp // includes foo.h and calls one inline function from it (or uses a type, etc) from some external function a() b/ b.cpp // does the same thing as a.cpp, but with its own external function, b() if you run "clang++ -emit-llvm -S -Iinclude -c lib/a/a.cpp lib/b/b.cpp -g" you get two .ll files both with the obvious: !9 = metadata !{metadata !"include/foo.h", metadata !"/tmp/dbginfo/pathtest"} But if you do this instead: "cd lib/a; clang++ -emit-llvm -S -I../../include -c a.cpp -g; cd ../../lib/b; clang++ -emit-llvm -S -I../../include -c b.cpp -g" you get two different nodes: !9 = metadata !{metadata !"../../include/foo.h", metadata !"/tmp/dbginfo/pathtest/lib/b"} !9 = metadata !{metadata !"../../include/foo.h", metadata !"/tmp/dbginfo/pathtest/lib/a"} and now you're left with a situation in which almost all the metadata is different and any place you were relying on the standard metadata uniquing you won't get it :(
This might be fixed by making MDFile
(or DIFile
) first-class. We
just need to canonicalize on creation.
class MDFile {
public:
// Split the path at the right place.
MDFile *get(LLVMContext &C, StringRef Path);
// Convenience for callers, but the path gets canonicalized.
MDFile *get(LLVMContext &C, StringRef File, StringRef Dir);
StringRef getFilename() const;
StringRef getDirectory() const;
};
Note that whether we continue to use MDString
under the hood is an
implementation detail.
However, path canonicalization (in particular, eating "..") requires
a stat()
to do correctly on *NIX, so the implementation would have
to cache lookups. Doesn't seem hard though.
- Previous message: [LLVMdev] debugloc metadata variation
- Next message: [LLVMdev] debugloc metadata variation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]