Improving documentation of what is considered a security issue in LLVM (original) (raw)

The LLVM security response group would like to improve the wording of what is considered a security issue by the LLVM project.

We observe that for some users of LLVM, it is too unclear what the LLVM community considers security-sensitive and what it does not. We’d like to improve the documentation to help those users.

With this RFC, we’d like to gather further feedback from the wider LLVM community on the proposed changes outlined below.

The current wording is as follows:

The security-sensitive parts of the LLVM Project currently are the following. Note that this list can change over time.

None are currently defined. Please don’t let this stop you from reporting issues to the LLVM Security Response Group that you believe are security-sensitive.

The parts of the LLVM Project which are currently treated as non-security sensitive are the following. Note that this list can change over time.

Language front-ends, such as clang, for which a malicious input file can cause undesirable behavior. For example, a maliciously crafted C or Rust source file can cause arbitrary code to execute in LLVM. These parts of LLVM haven’t been hardened, and compiling untrusted code usually also includes running utilities such as make which can more readily perform malicious things.

We propose to change the wording to the following. The changed parts are in bold.

The security-sensitive parts of the LLVM Project currently are the following. Note that this list can change over time.

Run-time libraries linked into compiled code. This covers the following top-level directories in the LLVM repository: libc, libc++, compiler-rt, flang-rt, libcxx, libcxxabi, libunwind, openmp/runtime, pstl, libclc.

The parts of the LLVM Project which are currently treated as non-security sensitive are the following. Note that this list can change over time.

LLVM’s language frontends, analyzers, optimizers, and code generators for which a malicious input can cause undesirable behavior. For example, a maliciously crafted C, Rust or bitcode input file can cause arbitrary code to execute in LLVM. These parts of LLVM haven’t been hardened, and handling untrusted code usually also includes running utilities such as make which can more readily perform malicious things. For example, vulnerabilities in clang, clangd, or the LLVM optimizer in a JIT caused by untrusted inputs are outside of the scope. We recommend the use of an external sandbox in those cases.

philnik June 5, 2025, 8:36am 2

Nit: libc++ isn’t a directory

More importantly: Where do I find a definition of what “security-sensitive” means?

nikic June 5, 2025, 8:55am 3

I’m on board with your clarifications of what is out of scope.

Your proposed changes to what is in scope, however, are very concerning. While I agree in principle that runtimes are security-relevant areas of LLVM, I think it will be highly damaging to open the CVE floodgates on those areas without much more detailed, per-project guidance and maintainer buy-in.

Here are just some random thoughts on your list:

pstl: At the last maintainer meeting the consensus was that this is basically dead code and should be deleted from the monorepo entirely. Let’s not make security promises about unmaintained code!
libunwind: Unwinding accesses a bunch of random memory based on various metadata. It’s probably not even possible to prevent crashes with arbitrarily malformed unwind data? What exactly would count as a security issue here?
compiler-rt: A lot of the runtimes are explicitly non-production. E.g. do you really want to make security promises about the memory sanitizer runtime?
etc.

I could go on and on here, but my general point is that just saying all our runtimes code is now security-sensitive is not a good idea. If you want to extend this policy, you need to do it carefully, with detailed project-by-project guidance.

kbeyls June 6, 2025, 9:22am 4

That’s a great question! AFAIK, we’ve never had a concrete definition of “security-sensitive”. One way to define it could be based on how bugs in security-sensitive areas that could lead to exploits are handled. For example: “Bugs and issues found in security-sensitive parts can be reported to the security response group. That group will analyze whether this bug could plausibly lead to an exploit. If so, the group will initiate a coordinated disclosure process. If not, a regular bug report will be raised in LLVM’s regular issue tracker.”

kbeyls June 6, 2025, 9:35am 5

Thanks for sharing your thought, @nikic. I think you make great points. Related to the reply to @philnik above of what “security-sensitive” could mean, it seems to make sense to me to try and describe the areas in the run-time libraries where bugs could plausibly lead to exploits of in-production binaries if they include some parts of the run-times that are reasonably expected to be part of in-production binaries. (For example, not sanitizers, but probably most of libc and libc++?)

I fully agree this does need buy-in from the maintainers, and that is one of the reasons I started this RFC. I’ll try to tag the maintainers from the run-time libraries that have maintainers to see if they have thoughts or suggestions here: @michaelrj-google, @ldionne, @compnerd, @pcc, @petrhosek, @Ferris, @lhames, @void, @sscalpone, @frasercrmck . I might’ve guessed some of the discourse handles of maintainers incorrectly and might’ve missed some…

Out of curiosity, does this cover cases where a non-malicious input gets miscompiled into exploit-enabling code (for example an optimization miscompiles away a buffer length check or something like that)? Should this be considered a security issue too?

If it is supposed to be considered a security issue, it does not seem obvious to me from reading either the old wording or the new wording.

kbeyls June 6, 2025, 1:45pm 7

Thanks for asking the question.

I believe this issue reported in 2021 seems somewhat similar to what you’re thinking off. It may be the only such issue the security response group has received since it was setup about 5 years ago. (There’s a chance there might be one or two more I’m forgetting about).

We don’t consider most mis-compilations as security issues. But a mis-compilation where there are indications that it can result in the binary becoming significantly easier to exploit could be considered “needing coordinated disclosure”.

If we can find the right wording, I’m fine with adding a bullet point covering this.

MaskRay June 6, 2025, 4:57pm 8

+1. I vaguely recall that nongnu libunwind has a build mode to protect reads during the unwinding process to prevent crashes due to invalid .eh_frame data (issues in compiler-generated code or inline assembly without CFI annotations). The data could definitely be made malformed to guide libunwind to read what was not supposed to.
However, the mode is expensive (checking for every read, if your target audience is well-controled code introduces overhead).

The exception handling part of libc++abi is similar.

Agreed that we cannot make security promises. Many sanitizers features are considered debug features and not for production.

Hi, LLVM-libc maintainer here.

I’m generally in favor of labelling libc as “security-sensitive” but I think it’s important to clarify what’s the responsibility of the library and what’s the responsibility of the caller. As an example, a malformed printf format string can cause out of bounds memory reads. This isn’t something the implementation can really solve, it’s fundamental to the design of the function. It’s the user’s responsibility to ensure the number of format specifiers matches the number of arguments, because if it doesn’t bad things can happen.

I’ll pass this post along to other LLVM-libc maintainers to get their thoughts. Thanks for bringing this up!

As the maintainer for the compiler builtins, I agree that these should be considered security sensitive. They are often linked into the applications statically which makes repairs challenging. I would like to understand what promises you expect to be made though. I think that at least following CVE practices makes sense.

ldionne June 9, 2025, 4:07pm 11

Thanks for the RFC! I think I’ll mostly echo what other maintainers have said for libc++. While I do think that many areas of libc++ should be treated as security sensitive, I would like to see clearer guidelines about what kind of bug/request we consider security sensitive and what is the expected handling of such issues (for whoever gets the sensitive bug report “assigned” to them). Apologies if that’s already discussed somewhere – I didn’t see anything on this page.

For example, let’s say someone reports a bug saying that accessing a std::valarray object out-of-bounds allows reading/writing arbitrary memory. That is technically true, and we don’t even provide a hardening assertion that can be turned on to catch this misuse. However, pedantically speaking, accessing a valarray out of bounds is a precondition violation in the Standard, which means that it is UB and hence it’s a bug in the user code, not libc++. Today, I’d explain that the user code is UB and I’d essentially treat it as a feature request for more hardening in std::valarray::operator[], not as a security critical bug. Is there an expectation that we’d start treating these kinds of bugs any differently than we do today?

Different people have different opinions about what constitutes a security issue, and I would like to make sure that we don’t open the floodgates to considering arbitrary issues as security critical. But apart from these clarifications, I think it makes sense to consider (most of) the runtimes as security sensitive.

kbeyls June 12, 2025, 12:28pm 12

Thank you everyone for the great feedback so far! Based on that feedback, allow me to iterate on the proposed wording changes. The new proposed wording changes below are by no means final, it’s an iteration towards a wording that at some point is hopefully good enough to publish.
I tried to make sure that the updated wording takes into account all feedback received, but please do let me know if I’ve missed something. Changes to current published wording are in bold:

Below, with “security-sensitive”, we mean that a discovered bug or vulnerability that might enable an exploit, may require coordinated disclosure, and therefore should be reported to the LLVM Security Response group rather than publishing in the public bug tracker.
The security-sensitive parts of the LLVM Project currently are the following. Note that this list can change over time. If you’re not sure whether an issue is in-scope for this security process or not, err towards assuming that it is. The Security Response Group might agree or disagree and will explain its rationale in the report, as well as update this document through the above process.

Code generation: most mis-compilations are not security sensitive. However, a mis-compilation where there are clear indications that it can result in the produced binary becoming significantly easier to exploit could be considered security sensitive, and should be reported to security response group.

Run-time libraries: parts of the run-time libraries are considered security-sensitive. Parts that are explicitly not considered security sensitive include:

parts of the run-time libraries that are not meant to be included in production binaries. For example, most sanitizers are not considered security-sensitive as they are meant to be used during development only, not in production.

for libc and libc++: if a user calls library functionality in an undefined or otherwise incorrect way, this will most likely not be considered a security issue, unless the libc/libc++ documentation explicitly promises to harden or catch that specific undefined behaviour or incorrect usage.

unwinding and exception handling: the implementations are not hardened against malformed or malicious unwind or exception handling data. This is not considered security sensitive.

The parts of the LLVM Project which are currently treated as non-security sensitive are the following. Note that this list can change over time.

LLVM’s language frontends, analyzers, optimizers, and code generators for which a malicious input can cause undesirable behavior. For example, a maliciously crafted C, Rust or bitcode input file can cause arbitrary code to execute in LLVM. These parts of LLVM haven’t been hardened, and handling untrusted code usually also includes running utilities such as make which can more readily perform malicious things. For example, vulnerabilities in clang, clangd, or the LLVM optimizer in a JIT caused by untrusted inputs are outside of the scope. We recommend the use of an external sandbox in those cases.

There were a few other questions or remarks in the feedback that the following bullet points hopefully will help to answer:

When a report comes in, the LLVM security response group makes a judgment call on whether a coordinated disclosure is needed.
So far, the LLVM security response group have not created any CVEs. I consider whether it should start doing so to be an orthogonal discussion.
There was a question of what is expected from whoever gets assigned a reported security issue. So far, one or more members of the security response group have been the ones driving an issue to conclusion. To be able to make the right judgement calls, depending on the issue, one or a few experts/maintainers not part of the security group are sometimes invited to offer their analysis/opinion. In a very small number of reported issues, we’ve seen the need for a fix to be created in a coordinated, non-public, way. There have been so few such cases that no “typical” way of doing so has been established, and the best way to do this has been defined case-by-case.
I’m expecting that issues will continue to be handled in roughly the same way.

ldionne June 12, 2025, 2:45pm 13

I think I am fine with the refined text, understanding that there are probably many cases not covered here that might come up and we’ll figure out what to do on a case by case basis. I expect some case by case handling regardless of this wording anyway. As long as we don’t give the impression that we’re treating just about any issue tagged as “security critical” as a P1 without proper consideration, I’m personally satisfied.

Overall I agree with Louis. This wording looks good, we can refine it as necessary. One thing to potentially mention is if we plan to backport security fixes and how far. I’d say for something security critical it’s probably worthwhile to try and fix it in the most recent release, but at least for LLVM-libc I don’t think it’s useful to go farther.

jyknight June 12, 2025, 10:49pm 15

However, a mis-compilation where there are clear indications that it can result in the produced binary becoming significantly easier to exploit could be considered security sensitive, and should be reported to security response group.

Maybe we should say something about likelihood of real-world impact here? E.g. you could probably create an artificial example for any miscompile, where it would cause some specially-crafted code to become exploitable. But we don’t want to treat all miscompile bugs as a security issue.

Probably also ought to generalize the exclusion of uses which are “undefined or otherwise incorrect” to also apply to codegen “bugs”, instead of only library functions. Of course, if your program invokes UB, the compiler can do whatever it wants, so “undesired” compiler outputs for such a program is not even a bug in the compiler – never mind a security bug. But…folks who aren’t compiler writers don’t necessarily feel the same way.

riking June 13, 2025, 7:21am 16

This needs a few more rounds of copy editing, and some of it should be split to the “not security sensitive” bullet points.

First, clearly list the libraries we have agreed to take responsibility on. I think a good first pass is: compiler-rt, libc, and parts of libc++.

Then below, specify that libc and libc++ only applies to security issues that can be triggered despite the call sites being correct according to the documentation.

For example, is a call to qsort with a comparison function of randu16() < 32000 correct? No.

Then talk about handling as security issues miscompiles in code generation triggered by correct source code. Incorrect or malicious source code should still be out of scope for now, as well as a judgement call for whether the source pattern actually occurs in existing software. If we KNOW the miscompile trigger exists in a lot of software, that clearly needs disclosure work.

The other three bullet points should be moved below the “not covered by security policy” heading.