Hardware-assisted AddressSanitizer Design Documentation — Clang 21.0.0git documentation (original) (raw)

This page is a design document forhardware-assisted AddressSanitizer (or HWASAN) a tool similar to AddressSanitizer, but based on partial hardware assistance.

Introduction

AddressSanitizertags every 8 bytes of the application memory with a 1 byte tag (using shadow memory), uses redzones to find buffer-overflows and_quarantine_ to find use-after-free. The redzones, the quarantine, and, to a less extent, the shadow, are the sources of AddressSanitizer’s memory overhead. See the AddressSanitizer paper for details.

AArch64 has Address Tagging (or top-byte-ignore, TBI), a hardware feature that allows software to use the 8 most significant bits of a 64-bit pointer as a tag. HWASAN uses Address Taggingto implement a memory safety tool, similar to AddressSanitizer, but with smaller memory overhead and slightly different (mostly better) accuracy guarantees.

Intel’s Linear Address Masking (LAM) also provides address tagging for x86_64, though it is not widely available in hardware yet. For x86_64, HWASAN has a limited implementation using page aliasing instead.

Algorithm

For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf

Short granules

A short granule is a granule of size between 1 and TG-1 bytes. The size of a short granule is stored at the location in shadow memory where the granule’s tag is normally stored, while the granule’s actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities:

Pointer tags between 1 to TG-1 are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and TG-1 and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule).

When HWASAN detects an error near a memory tag between 1 and TG-1, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities.

Instrumentation

Memory Accesses

In the majority of cases, memory accesses are prefixed with a call to an outlined instruction sequence that verifies the tags. The code size and performance overhead of the call is reduced by using a custom calling convention that

Currently, the following sequence is used:

// int foo(int *a) { return *a; } // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c [...] foo: stp x30, x20, [sp, #-16]! adrp x20, :got:__hwasan_shadow // load shadow address from GOT into x20 ldr x20, [x20, :got_lo12:__hwasan_shadow] bl __hwasan_check_x0_2_short_v2 // call outlined tag check // (arguments: x0 = address, x20 = shadow base; // "2" encodes the access type and size) ldr w0, [x0] // inline load ldp x30, x20, [sp], #16 ret

[...] __hwasan_check_x0_2_short_v2: sbfx x16, x0, #4, #52 // shadow offset ldrb w16, [x20, x16] // load shadow tag cmp x16, x0, lsr #56 // extract address tag, compare with shadow tag b.ne .Ltmp0 // jump to short tag handler on mismatch .Ltmp1: ret .Ltmp0: cmp w16, #15 // is this a short tag? b.hi .Ltmp2 // if not, error and x17, x0, #0xf // find the address's position in the short granule add x17, x17, #3 // adjust to the position of the last byte loaded cmp w16, w17 // check that position is in bounds b.ls .Ltmp2 // if not, error orr x16, x0, #0xf // compute address of last byte of granule ldrb w16, [x16] // load tag from it cmp x16, x0, lsr #56 // compare with pointer tag b.eq .Ltmp1 // if matches, continue .Ltmp2: stp x0, x1, [sp, #-256]! // save original x0, x1 on stack (they will be overwritten) stp x29, x30, [sp, #232] // create frame record mov x1, #2 // set x1 to a constant indicating the type of failure adrp x16, :got:__hwasan_tag_mismatch_v2 // call runtime function to save remaining registers and report error ldr x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler) br x16

Heap

Tagging the heap memory/pointers is done by malloc. This can be based on any malloc that forces all objects to be TG-aligned.free tags the memory with a different tag.

Stack

Stack frames are instrumented by aligning all non-promotable allocas by TG and tagging stack memory in function prologue and epilogue.

Tags for different allocas in one function are not generated independently; doing that in a function with M allocas would require maintaining M live stack pointers, significantly increasing register pressure. Instead we generate a single base tag value in the prologue, and build the tag for alloca number M as ReTag(BaseTag, M), where ReTag can be as simple as exclusive-or with constant M.

Stack instrumentation is expected to be a major source of overhead, but could be optional.

Globals

Most globals in HWASAN instrumented code are tagged. This is accomplished using the following mechanisms:

A complete example is given below:

// int x = 1; int *f() { return &x; } // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c

[...] f: adrp x0, :pg_hi21_nc:x // set bits 12-63 to upper bits of untagged address movk x0, #:prel_g3:x+0x100000000 // set bits 48-63 to tag add x0, x0, :lo12:x // set bits 0-11 to lower bits of address ret

[...] .data .Lx.hwasan: .word 1

  .globl  x
  .set x, .Lx.hwasan+0x2d00000000000000

[...] .section .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat .Lhwasan.note: .word 8 // namesz .word 8 // descsz .word 3 // NT_LLVM_HWASAN_GLOBALS .asciz "LLVM\000\000\000" .word __start_hwasan_globals-.Lhwasan.note .word __stop_hwasan_globals-.Lhwasan.note

[...] .section hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2 .Lx.hwasan.descriptor: .word .Lx.hwasan-.Lx.hwasan.descriptor .word 0x2d000004 // tag = 0x2d, size = 4

Error reporting

Errors are generated by the HLT instruction and are handled by a signal handler.

Attribute

HWASAN uses its own LLVM IR Attribute sanitize_hwaddress and a matching C function attribute. An alternative would be to re-use ASAN’s attributesanitize_address. The reasons to use a separate attribute are:

This does mean that users of HWASAN may need to add the new attribute to the code that already uses the old attribute.

Comparison with AddressSanitizer

HWASAN:

The memory overhead of HWASAN is expected to be much smaller than that of AddressSanitizer:1/TG extra memory for the shadow and some overhead due to TG-aligning all objects.

Security Considerations

HWASAN is a bug detection tool and its runtime is not meant to be linked against production executables. While it may be useful for testing, HWASAN’s runtime was not developed with security-sensitive constraints in mind and may compromise the security of the resulting executable.

Supported architectures

HWASAN relies on Address Tagging which is only available on AArch64. For other 64-bit architectures it is possible to remove the address tags before every load and store by compiler instrumentation, but this variant will have limited deployability since not all of the code is typically instrumented.

On x86_64, HWASAN utilizes page aliasing to place tags in userspace address bits. Currently only heap tagging is supported. The page aliases rely on shared memory, which will cause heap memory to be shared between processes if the application calls fork(). Therefore x86_64 is really only safe for applications that do not fork.

HWASAN does not currently support 32-bit architectures since they do not support Address Tagging and the address space is too constrained to easily implement page aliasing.