Why compiler two files in one command will use posix_spawn to launch two processes (original) (raw)

December 2, 2024, 9:09am 1

i want to compiler two files by ExecuteCC1Tool function in main function, but the second file will have no output. Why we have to launch two processes to compiler two file? Can we just use one main one processes?

I alse found that when i compiler one file, i went to ExecuteCC1Tool function. But when i compiler two files, i went to ExecuteCommand function. What’s the difference between these two functions.

cc1 is historically a separate binary but in clang’s case it is contained in the same binary as the compiler that users run (“clang”).

Driving Compilers goes into this in a lot of depth, but Driving Compilers " Driving a multi-file project" in particular has an example like I think you are talking about.

(llvm-project/flang/docs/FlangDriver.md at main · llvm/llvm-project · GitHub also talks about the concept, albeit for Flang, LLVM’s Fortran compiler, it’s the same principles)

IIRC the compiler will make a set of jobs to do to carry out the command. Compile object 1, object 2, link object 1 and 2.

$ ./bin/clang /tmp/a.c /tmp/b.c -o /dev/null -###
clang version 20.0.0git (https://github.com/llvm/llvm-project.git 7a7a426188eea0b181738436205759e13ad6fd7b)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/david.spickett/build-llvm-aarch64/bin
Build config: +assertions
 "/home/david.spickett/build-llvm-aarch64/bin/clang-20" "-cc1" <...> "-o" "/tmp/a-cc62cc.o" "-x" "c" "/tmp/a.c"
 "/home/david.spickett/build-llvm-aarch64/bin/clang-20" "-cc1" <...> "-o" "/tmp/b-5a560e.o" "-x" "c" "/tmp/b.c"
 "/usr/bin/ld" <...> "/tmp/a-cc62cc.o" "/tmp/b-5a560e.o" "<...>"

Not sure whether clang refers to the compiler command line as a “command” or a “job” (as in, is a job made of commands or a command made of jobs). But you get the idea, it’s spawning processes to do these tasks.

Clang does have a mode that re-uses its compiler process as the cc1 process but this can only be done if running cc1 is the only thing clang is going to do.

$ strace ./bin/clang /tmp/a.c -o /dev/null -c 2>&1 | grep clone
$ <didn't fork/clone into a new process>
$ strace ./bin/clang /tmp/a.c -o /dev/null -fno-integrated-cc1 -c 2>&1 | grep clone
clone(child_stack=0xffff9a9b7000, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = 1811620

I think because when the compiler has to compile A and B then link the two, something needs to wait for A and B to finish compiling, and that thing is the first clang process.

clang a.c b.c -o /dev/null --+--> clang -cc1 a.c -o a.o --+--> ld a.o b.o main.o
                             \--> clang -cc1 b.c -o b.o --/

My best ASCII diagram of what I mean by that. Something has to make sure that a and b get built before ld is run.

Would it be technically possible to do all this in process? Almost certainly (you would still have to start a new process for the linker) but the difficulty of doing so is high and many build systems will compile object files individually anyway which reduces the return on investment for such a feature.

https://clang.llvm.org/docs/DriverInternals.html might be a good reference, I at least learned that -ccc-print-phases exists:

$ ./bin/clang /tmp/a.c /tmp/b.c -o /dev/null -ccc-print-phases
            +- 0: input, "/tmp/a.c", c
         +- 1: preprocessor, {0}, cpp-output
      +- 2: compiler, {1}, ir
   +- 3: backend, {2}, assembler
+- 4: assembler, {3}, object
|           +- 5: input, "/tmp/b.c", c
|        +- 6: preprocessor, {5}, cpp-output
|     +- 7: compiler, {6}, ir
|  +- 8: backend, {7}, assembler
|- 9: assembler, {8}, object
10: linker, {4, 9}, image

Which is a bit better than my diagram (~~not sure how to interpret the indentation though~~). The numbers in {} are results of previous steps.

Edit: actions that are further to the right are earlier in the sequence, so that the bottom left of the output is always the final “sink” point where the overall command finishes.

For a single file all clang needs to do is run clang -cc1 which is what ExecuteCC1Tool does.

When you have two files it’s got to manage all these tasks which I assume is called the “Command”, hence ExecuteCommand.

From reading Driver Design & Internals — Clang 20.0.0git documentation, I think a “job” is a small part of a “command”:

Once the arguments are parsed, the tree of subprocess jobs needed for the desired compilation sequence are constructed.

Cecilia December 2, 2024, 10:43am 4

Thank you so much. Learned a lot from your reply.
In clang/lib/Driver/Compilation.cpp,ExecuteCommand will deal with each job, and I found the type of Job is Commands. Maybe Commands will include all info when compiler file

void Compilation::ExecuteJobs(const JobList &Jobs,
                              FailingCommandList &FailingCommands,
                              bool LogOnly) const {
  // According to UNIX standard, driver need to continue compiling all the
  // inputs on the command line even one of them failed.
  // In all but CLMode, execute all the jobs unless the necessary inputs for the
  // job is missing due to previous failures.
  for (const auto &Job : Jobs) {
    if (!InputsOk(Job, FailingCommands))
      continue;
    const Command *FailingCommand = nullptr;
    if (int Res = ExecuteCommand(Job, FailingCommand, LogOnly)) {
      FailingCommands.push_back(std::make_pair(Res, FailingCommand));
      // Bail as soon as one command fails in cl driver mode.
      if (TheDriver.IsCLMode())
        return;
    }
  }
}