[llvm-dev] [RFC] Pagerando: Page-granularity code randomization (original) (raw)
Stephen Crane via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 10 19:37:50 PDT 2018
- Previous message: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization
- Next message: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for the feedback, Kostya! Sorry I haven’t replied sooner, I’m currently on vacation but wanted to write up a general summary of the project status before getting into the specifics you brought up.
On Wed, Oct 3, 2018 at 4:12 PM Kostya Serebryany <kcc at google.com> wrote:
* Huge complexity. This is not just the compiler, but also the rest of the toolchain and run-times (linkers, debuggers, unwinders, symbolizers).
Agreed, pagerando adds complexity to the toolchain. Changes will be required to debuggers, unwinders, etc., but from our experience, these tools require fairly small changes to work with pagerando. E.g., the DSBT ABI uses a table similar to the POT to store the dynamic address of each segment, and gdb already supports this ABI with a target-specific handler for DSOs. Modifying this ABI has a minimal impact on the rest of the debugger.
The decision to take on most of the toolchain complexity is one that will rest on the platform deploying pagerando, which depends on how amenable the target platform is to a pervasive change in shared library layout handling.
I'd like to hear from some offensive security experts here their comparison of PageRando vs CFI-like schemes (that are much cheaper, and are already available)
Pagerando is an improvement over ASLR; it is certainly not intended as a replacement for CFI. Pagerando instead complements CFI as a defense in depth by making it harder to reliably exploit unconstrained (legacy code w/o CFI) and weakly-constrained (e.g. those that require many targets w/CFI) branches.
* Spilling the POT register may reveal the secret and make all this pointless. If we want to mix instrumented and non-instrumented code (and still have the protection) we'll need to at least recompile all the non-instrumented code with x18 reserved so that we don't need to spill it. That's the same problem we have with the ShadowCallStack though.
Agreed. We’ve bounced around a few ways to mitigate this leakage, but we don’t have a great solution yet. It’s not trivial to exploit this weakness, but it is a concern in mixed code. We intend to focus, at least initially, on privileged processes where the only non-pagerando code is the main binary which can reserve the necessary register. We must still preserve compatibility with other heterogeneous processes, and allowing the callee-saved register to spill is the simplest way to do this. Ideally, we would integrate ShadowCallStack and pagerando register usage so we only need a single register rather than two for the combination. Any solution we use for one would benefit the other.
* 20-30% code size overhead is a no-go for the majority of large apps (speaking from my Chrome experience) and thus this will remain a niche tool.
Pagerando is only applicable to system-wide shared libraries. These are mostly small so I’m not as concerned about code size overhead as I would be for large binaries like Chrome. However, this is still a valid overhead concern. We’ve been working to reduce it by lowering the number of entry wrappers needed. I suspect we can shave off a bit more code size by optimizing the inter-bin calls, but I’m not counting on that.
After initially enabling pagerando for the subset of system libraries used in privileged processes, we plan to expand that set to larger libraries as we constrain external APIs to reduce the number of entry wrappers. For the limited subset, we have a significantly smaller impact on disk and memory usage.
* 3%-6% CPU overhead is also a concern for this kind of benefit, and I'm afraid that the overhead grows super-linearly with the binary size (more cache lines are touched by POT)
We think it is better to let users make decisions on a case-by-case basis since the averages hide substantial variance. On most Android workloads, we see no runtime performance overheads and we have made progress on the outlier cases as noted earlier in this thread.
If POT-induced cache pressure is indeed a problem on particular workloads, we can bound the size of the POT by increasing the bin granularity (e.g. 8K pages vs. 4K pages). In fact, for the unified POT optimization I touched on in the summary, we bound the POT for each library to a single 4K page for simplicity, which requires a handful of large libraries to use larger bin sizes.
* adding so many new indirect calls in the post-spectre world sounds scary, and so far I haven't seen any evaluation from spectre exerts on this thread.
I’m not a Spectre expert either but I think that randomizing the entire victim process’s address space may make it more difficult to perform a branch target injection attack since it requires a known target address to train the predictor. Additionally, many of those indirect calls are at randomized addresses, which makes the attack more difficult since the BTB uses these addresses in its lookup algorithm.
Moreover, it appears that Spectre-V2 is being addressed through OS and firmware updates. ARM is committed to making its future Cortex processors resilient at the hardware level (Cortex-A76 is already resilient to variants 2 and 3). The Linux kernel now has support for invalidating the ARM and AArch64 BTB on context switch, which also mitigates variant 2.
Overall, I agree with the concerns you raise. At the same time, I think that the cost/benefit decision must be made on a case-by-case basis according to the users’ operational constraints. I hope that we can have a shot at maturing pagerando in tree leading to eventual deployment.
Thanks, stephen
- Previous message: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization
- Next message: [llvm-dev] [RFC] Pagerando: Page-granularity code randomization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]