Chris Lattner - Re: LLVM/GCC Integration Proposal (original) (raw)

This is the mail archive of the gcc@gcc.gnu.orgmailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

From: Chris Lattner
To: Kenneth Zadeck
Cc: gcc at gcc dot gnu dot org
Date: Sat, 19 Nov 2005 16:22:31 -0600 (CST)
Subject: Re: LLVM/GCC Integration Proposal

Kenneth Zadeck writes:

This quickly becomes difficult and messy, which is presumably why the link-time proposal allows the linker to "give up" linking two translation units.

The reason for the complexity of the type system handling in our proposal was motivated primarily by two concerns:

We need to keep track of the types so that they are available for debugging.

...

loosing all the type information before you start did not seem the correct plan.

This is exactly my point. The problem here is NOT the fact that the optimization representation can't represent everything that the debug information does. The problem is that your approach conflates two completely separate pieces of information. Consider:

Debug information must represent the source-level program with 100% fidelity. At link time, the debug information must be merged (and perhaps optimized for size), but you do not need to merge declarations or types across language boundaries, or in non-trivial cases.
An optimization representation need not (and if fact does not want to) represent the program at the source-level. However, it must be able to link declarations and types across modules and across languages without exception (otherwise, you will miscompile the program). Designing a representation where this is not practically possible requires a back-off mechanism as your proposal has outlined.

To me, the correct solution to this problem is to not try to combine the representations. Instead, allow the debug information to capture the important information that it does well (e.g. types and declarations in a language-specific way) and allow the optimization representation to capture the semantics of the program in a way that is as useful for
optimization and codegen purposes as possible.This approach is the one we have always taken with LLVM (except of course that we have been missing debug info, because noone got around to implementing it), which might explain some of the confusion around "lacking high-level information".

I personally cannot guarantee that GCC (or for that matter any optimizing compiler) can correctly cross inline and compile a program if the types in one module are not consistent with the types in another module. Just because the program happens to work correctly when separately compiled is not enough.

This is a direct result of the representation that you are proposing to use for IPA. LLVM is *always* capable of merging two translation units correctly, no matter where they came from. We do this today. If you look back to my 2003 GCC summit paper (Sec4.4), I mention the fact that this is not a trival problem. :)

When Mark and I started working on this proposal (and later the rest of the volunteers) we decided that this was not going to be either an academic exercise or just something to run benchmarks.

I'm glad. While IMA is an interesting step in the right direction, it has not seen widespread adoption for this reason. I'm glad that your goal is to design something like LLVM, which always works.

What that means to me is that the link time optimizer needs to be able to either generate correct code or give up in some predictable manner. Having the compiler push forward and hope everything turns out OK is not enough. Discretion is the better part of valor.

I prefer to design the compiler so that neither 'giving up' nor 'hope' is required. This is an easily solvable problem, one that LLVM has had right for several years now.

I think that taking advantage of mixed C, C++ or C and Fortran programs is going to be hard.

I don't agree.

But it is what the GCC customers want and there is a desire to accommodate them if possible.

Outside benchmarks, many programs are made up of different language components. There are of course the trivial cases (such as optimizing across JNI/CNI/Java and C/C++ code), but many programs, particularly large ones, have pieces written in multiple languages. I believe Toon was recently talking about his large weather program written in Fortran and C (though I could be confusing Toon's program with another one).-Chris

-- http://nondot.org/sabre/ http://llvm.org/

Follow-Ups:
- Re: LLVM/GCC Integration Proposal
  * From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]