[lldb][debugserver] Read/write SME registers on arm64 by jasonmolenda · Pull Request #119171 · llvm/llvm-project (original) (raw)

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

The Apple M4 line of cores includes the Scalable Matrix Extension (SME) feature. The M4s do not implement Scalable Vector Extension (SVE), although the processor is in Streaming SVE Mode when the SME is being used. The most obvious side effects of being in SSVE Mode are that (on the M4 cores) NEON instructions cannot be used, and watchpoints may get false positives, the address comparisons are done at a lowered granularity.

When SSVE mode is enabled, the kernel will provide the Streaming Vector Length register, which is a maximum of 64 bytes with the M4. Also provided are SVCR (with bits indicating if SSVE mode and SME mode are enabled), TPIDR2, SVL. Then the SVE registers Z0..31 (SVL bytes long), P0..15 (SVL/8 bytes), the ZA matrix register (SVL*SVL bytes), and the M4 supports SME2, so the ZT0 register (64 bytes).

When SSVE/SME are disabled, none of these registers are provided by the kernel - reads and writes of them will fail.

Unlike Linux, lldb cannot modify the SVL through a thread_set_state call, or change the processor state's SSVE/SME status. There is also no way for a process to request a lowered SVL size today, so the work that David did to handle VL/SVL changing while stepping through a process is not an issue on Darwin today. But debugserver should be providing everything necessary so we can reuse all of David's work on resizing the register contexts in lldb if it happens in the future. debugbserver sends svl, svcr, and tpidr2 in the expedited registers when a thread stops, if SSVE|SME mode are enabled (if the kernel allows it to read the ARM_SME_STATE register set).

While the maximum SVL is 64 bytes on M4, the AArch64 maximum possible SVL is 256; this would give us a 65k ZA register. If debugserver sized all of its register contexts assuming the largest possible SVL, we could easily use 2MB more memory for the register contexts of all threads in a process -- and on iOS et al, processes must run within a small memory allotment and this would push us over that.

Much of the work in debugserver was changing the arm64 register context from being a static compile-time array of register sets, to being initialized at runtime if debugserver is running on a machine with SME. The ZA is only created to the machine's actual maximum SVL. The size of the 32 SVE Z registers is less significant so I am statically allocating those to the architecturally largest possible SVL value today.

Also, debugserver includes information about registers that share the same part of the register file. e.g. S0 and D0 are the lower parts of the NEON 128-bit V0 register. And when running on an SME machine, v0 is the lower 128 bits of the SVE Z0 register. So the register maps used when defining the VFP registers must differ depending on the runtime state of the cpu.

I also changed register reading in debugserver, where formerly when debugserver was asked to read a register, and the thread_get_state read of that register failed, it would return all zero's. This is necessary when constructing a g packet that gets all registers - because there is no separation between register bytes, the offsets are fixed. But when we are asking for a single register (e.g. Z0) when not in SSVE/SME mode, this should return an error.

This does mean that when you're running on an SME capabable machine, but not in SME mode, and do register read -a, lldb will report that 48 SVE registers were unavailable and 5 SME registers were unavailable. But that's only when -a is used.

The register reading and writing depends on new register flavor support in thread_get_state/thread_set_state in the kernel, which is not yet in a release. The test case I wrote is skipped on current OSes. I pilfered the SME register setup from some of David's existing SME test files; there were a few Linux specific details in those tests that they weren't easy to reuse on Darwin.

rdar://121608074


Patch is 67.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119171.diff

9 Files Affected:

diff --git a/lldb/source/Plugins/Architecture/AArch64/ArchitectureAArch64.cpp b/lldb/source/Plugins/Architecture/AArch64/ArchitectureAArch64.cpp index 181ba4e7d87721..6a072354972acd 100644 --- a/lldb/source/Plugins/Architecture/AArch64/ArchitectureAArch64.cpp +++ b/lldb/source/Plugins/Architecture/AArch64/ArchitectureAArch64.cpp @@ -100,6 +100,25 @@ bool ArchitectureAArch64::ReconfigureRegisterInfo(DynamicRegisterInfo &reg_info, if (reg_value != fail_value && reg_value <= 32) svg_reg_value = reg_value; }

diff --git a/lldb/test/API/macosx/sme-registers/Makefile b/lldb/test/API/macosx/sme-registers/Makefile new file mode 100644 index 00000000000000..d4173d262ed270 --- /dev/null +++ b/lldb/test/API/macosx/sme-registers/Makefile @@ -0,0 +1,5 @@ +C_SOURCES := main.c + +CFLAGS_EXTRAS := -mcpu=apple-m4 + +include Makefile.rules diff --git a/lldb/test/API/macosx/sme-registers/TestSMERegistersDarwin.py b/lldb/test/API/macosx/sme-registers/TestSMERegistersDarwin.py new file mode 100644 index 00000000000000..82a5eb0dc81a6b --- /dev/null +++ b/lldb/test/API/macosx/sme-registers/TestSMERegistersDarwin.py @@ -0,0 +1,164 @@ +import lldb +from lldbsuite.test.lldbtest import * +from lldbsuite.test.decorators import * +import lldbsuite.test.lldbutil as lldbutil +import os + + +class TestSMERegistersDarwin(TestBase): +

diff --git a/lldb/test/API/macosx/sme-registers/main.c b/lldb/test/API/macosx/sme-registers/main.c new file mode 100644 index 00000000000000..00bbb4a5551622 --- /dev/null +++ b/lldb/test/API/macosx/sme-registers/main.c @@ -0,0 +1,123 @@ +/// BUILT with +/// xcrun -sdk macosx.internal clang -mcpu=apple-m4 -g sme.c -o sme + + +#include <stdio.h> +#include <stdint.h> +#include <stdlib.h> + + +void write_sve_regs() {

+void set_za_register(int svl, int value_offset) {

+} + + +// lldb/test/API/commands/register/register/aarch64_sme_z_registers/save_restore/main.c +void +arm_sme2_set_zt0() { +#define ZTO_LEN (512 / 8)

+#undef ZT0_LEN +} + +int main() +{ +

uint32_t DNBArchMachARM64::GetCPUType() { return CPU_TYPE_ARM64; }

+static std::once_flag g_cpu_has_sme_once; +bool DNBArchMachARM64::CPUHasSME() {

+kern_return_t DNBArchMachARM64::GetSVEState(bool force) {

+kern_return_t DNBArchMachARM64::SetSVEState() {