A Survey of Operating System Kernel Fuzzing (original) (raw)

, He Sun qldxtest@gmail.com Institute for Network Science and Cyberspace, Tsinghua UniversityBeijingChina , Shihao Jiang sh.jiang@zju.edu.cn Zhejiang UniversityHangzhouChina , Qinying Wang wangqinying@zju.edu.cn Zhejiang UniversityHangzhouChina , Mingming Zhang zhangmm@mail.zgclab.edu.cn Zhongguancun LaboratoryBeijingChina , Xiang Li lixiang@nankai.edu.cn Nankai UniversityTianjinChina , Kaiwen Shen kaiwenshen17@gmail.com Institute for Network Science and Cyberspace, Tsinghua UniversityBeijingChina , Charles Zhang charles@vul337.team Tsinghua UniversityHangzhouChina , Shouling Ji sji@zju.edu.cn Zhejiang UniversityHangzhouChina , Peng Cheng lunarheart@zju.edu.cn Zhejiang UniversityHangzhouChina and Jiming Chen cjm@zju.edu.cn Zhejiang UniversityHangzhouChina

(2018)

Abstract.

The Operating System (OS) kernel is foundational in modern computing, especially with the proliferation of diverse computing devices. However, its development also comes with vulnerabilities that can lead to severe security breaches. Kernel fuzzing, a technique used to uncover these vulnerabilities, poses distinct challenges when compared to user-space fuzzing. These include the complexity of configuring the testing environment and addressing the statefulness inherent to both the kernel and the fuzzing process. Despite the significant interest from the community, a comprehensive understanding of kernel fuzzing remains lacking, hindering further progress in the field. In this paper, we present the first systematic study focused specifically on OS kernel fuzzing. We begin by outlining the unique challenges of kernel fuzzing, which distinguish it from those in user space. Following this, we summarize the progress of 107 academic studies from top-tier venues between 2017 and 2025. To structure this analysis, we introduce a stage-based fuzzing model and a novel fuzzing taxonomy that highlights nine core functionalities unique to kernel fuzzing. Each of these functionalities is examined in conjunction with the methodological approaches employed to address them. Finally, we identify remaining gaps in addressing challenges and outline promising directions to guide forthcoming research in kernel security.

Fuzzing, Operating System Kernel

††copyright: acmlicensed††journalyear: 2018††doi: XXXXXXX.XXXXXXX††journal: JACM††journalvolume: 37††journalnumber: 4††article: 111††publicationmonth: 8††ccs: Security and privacy Operating systems security††ccs: Software and its engineering Software testing and debugging

1. Introduction

OS kernels are central to modern computing systems, enabling communication between software and hardware components. Given the OS kernel’s central role, its vulnerabilities lead to serious security breaches, including privilege escalation, sensitive data leakage, and remote code execution. For example, the Dirty Cow vulnerability (dir, 2016) in the Linux kernel is infamous for enabling unauthorized privilege escalation, allowing attackers to manipulate and execute code at the root level. The risk posed by such vulnerabilities is amplified by growing and increasingly complex mobile environments, making it critical to secure against these threats. Meanwhile, fuzzing has proven to be an effective and practical approach for vulnerability discovery. Therefore, OS kernel fuzzing techniques have attracted significant attention from the research community (Chen et al., 2022; Google, 2015; Pandey et al., 2019; Chen et al., 2021; Sun et al., 2021).

Compared to user-space fuzzing, OS kernel fuzzing presents significant and complex challenges for the following reasons. First, unlike applications operating in controlled and uniform environments, kernel code interacts with a broad array of hardware components, each featuring its own drivers and peculiarities (Wu et al., 2023b; Ma et al., 2022). This intricate interplay increases the risk of system-wide crashes or instability in the event of kernel faults, making it precarious (Muench et al., 2024; Maier et al., 2019). As a result, creating a consistent and reliable testing environment becomes particularly challenging. Second, synthesizing test cases for kernels is usually more challenging than for applications. The difficulty stems from the need to handle a wide variety of complex, structured inputs, e.g., system calls (syscalls) (Pailoor et al., 2018; Sun et al., 2021; Hao et al., 2023) and peripherals (Feng et al., 2020; Jiang et al., 2021), whose specifications are often deeply embedded within the kernel codebase. Finally, the kernel’s inherent complexity and low-level nature introduce additional obstacles. It is challenging to precisely control kernel actions, monitor its internal state, and accurately interpret its responses to inputs during fuzzing (Liu et al., 2020). These difficulties are further amplified by challenges related to scalability and lightweight design, which tend to become more pronounced as the fuzzing process grows or evolves.

Owing to these inherent complexities, OS kernel fuzzing techniques have become a major focus of extensive research. The rapidly expanding collection of OS kernel fuzzing techniques shows wide variation in goals and methods across different stages of the fuzzing pipeline. It is essential to conduct a deeper investigation into their shared characteristics and the specific challenges they aim to address. Additionally, assessing performance trade-offs and uncovering untapped opportunities for future advancements are vital to furthering progress in this field. However, thus far, no systematic review of the OS kernel has been conducted. Existing surveys mainly focus on general fuzzing techniques and evaluation criteria (Klees et al., 2018; Wang et al., 2020; Yun et al., 2022; Schloegel et al., 2024). Specifically, fuzzing for embedded systems, representing a related yet distinct line of work, is introduced (Zhu et al., 2022).

To achieve this, we conduct an extensive review of 107 OS kernel fuzzing papers published in top-tier conferences between 2017 and August 2025, providing insights into the three research questions:

Although coverage- and crash-based metrics can provide insight into a kernel fuzzer’s effectiveness, they are difficult to compare across different studies because of variations in definitions, experimental setups, and research objectives. Instead of relying solely on these metrics, we focus on the specific functionalities each technique implements, its applicability, and its methodological contributions. This functionality-oriented perspective offers a more practical understanding of a fuzzer’s impact and utility. With this in mind, we first present the background of OS kernel fuzzing in Section 2, followed by our surveying methodology and results in Section 3. In Section 4, we outline the unique challenges of fuzzing in the OS kernel domain. In this regard, we propose a stage-based fuzzing model that decomposes the fuzzing process into discrete steps. For each stage, we describe the essential functionalities and key techniques employed by kernel fuzzers, addressing RQ1. Afterward, we provide a discussion of the existing proposals targeting each stage, i.e., environment preparation (Section 5), input model (Section 6) and fuzzing loop (Section 7). Drawing on qualitative assessment criteria, we emphasize the implications learned from existing approaches and suggest promising technical solutions, responding to RQ2. Additionally, Section 8 addresses the challenges and future directions of kernel fuzzing research, partially informed by a case study. This analysis contributes to answering RQ3. Finally, we provide a conclusion in Section 9.

In summary, we make the following key contributions:

2. Preliminaries

2.1. Role of OS Kernel

Refer to caption

Figure 1. Typical role and architecture of OS kernel in modern computer systems.

In modern computer systems, the OS kernel serves as the fundamental component responsible for managing hardware resources and providing essential abstractions to user-space applications. At the highest level, the user-space hosts applications and runtime libraries, which interact with the kernel through system calls (syscalls) to request various services. At the lowest level lies the hardware layer, comprising physical components such as the CPU, memory, and peripheral devices. Situated between these layers, the kernel operates with elevated privileges and functions as a critical mediator, coordinating communication between software and hardware. It performs core tasks such as process scheduling, memory management, file system operations, inter-process communication, and device control (Hahm et al., 2016). These functions ensure that multiple applications can execute concurrently, securely, and efficiently while abstracting away the complexity of direct hardware management. The typical role and position of the OS kernel within this layered architecture are illustrated in Figure 1. As the backbone of virtually all computing platforms—from traditional servers and desktops to mobile and embedded systems—the kernel constitutes a foundational software infrastructure upon which higher-level services and applications depend. Its reliability is therefore critical not only to system stability but also to performance and security. Therefore, kernels are subject to rigorous testing aimed at improving overall reliability. Among testing methods, fuzzing has emerged as one of the most widely adopted approaches. For example, syzbot (Vyukov, [n. d.]), which is integrated into the upstream Linux development process, has reported over 5,000 bugs.

2.2. Categories of OS Kernels

With the rapid expansion of computing devices, their scope has extended beyond traditional PCs and servers to encompass a broad spectrum of mobile terminals, embedded systems, and Internet-of-Things (IoT) environments. This diversification has led to a corresponding increase in the variety of OS kernels tailored to different application domains. General-purpose kernels are designed to provide comprehensive functionality and flexibility across multiple platforms, powering millions of devices in diverse domains, particularly within desktop environments. Prominent examples include Linux (Torvalds and the Linux Kernel Community, 1991), Windows NT (Corporation, 1993), and XNU (Inc., 2001), all of which emphasize user experience and broad compatibility. In contrast, real-time operating system (RTOS) kernels, such as FreeRTOS (fre, 2003), prioritize deterministic scheduling and minimal latency to satisfy strict timing requirements in embedded and IoT settings. These kernels are often tightly coupled with heterogeneous hardware, presenting unique challenges for emulation and testing. Complementing these, mobile OS kernels, exemplified by the Android kernel, share certain traits with RTOS. At the same time, they are specifically optimized for energy efficiency, fine-grained access control, and the delivery of rich system services. These characteristics, while enhancing functionality, also introduce additional security risks through expanded interfaces. Beyond these categories, there are domain-specific kernels designed for resource-constrained IoT devices (Angelakopoulos et al., 2023, 2024; Levis, 2012) as well as for confidential computing (opt, 2014; Busch et al., 2023). The pervasive and foundational nature of OS kernels highlight the critical importance of ensuring their security, reliability, and resilience.

3. Paper Collection

3.1. Survey Methodology

We first outline the methodology we follow to collect papers. In this paper, the scope of OS kernels encompasses architectures that provide essential services necessary for system functionality, including hardware abstraction layer, driver model, memory management and scheduling (Hahm et al., 2016). We find that these kernels have evolved beyond traditional server or desktop models, taking on more diverse and specialized forms. Thus, our analysis includes kernels from general-purpose OS and their customizations (Torvalds and the Linux Kernel Community, 1991; Inc., 2001; Corporation, 1993; Google, 2008; Henkel, 2006), RTOS (vxw, 1987; Barabanov, 1997; Mera et al., 2021), TEE-OS (Drozdovskyi and Moliavko, 2019; opt, 2014), ROS (Kim and Kim, 2022; Bai et al., 2024a; Shen et al., 2024) and nano ones (fre, 2003; ARM, 2013), covering a range of environments from desktop to IoT devices. To ensure a comprehensive survey, we followed these steps:

  1. (1)
    Venue selection. Our study primarily focuses on papers published between 2017 and 2025 at A∗A\ast software engineering, cyber security and computer systems venues ranked by CORE2023 (ICORE, [n. d.]). The selected publication venues are listed in Table 1.
    Table 1. The selected top-tier venues from software engineering, cyber security and computer systems.
  2. (2)
    Keyword match. We then conducted a preliminary review of the literature and identified a set of keywords relevant to OS kernel fuzzing. These keywords were used to construct the final search query: (”operating system” OR ”OS kernel” OR ”Android services” OR ”rtos” OR ”firmware”) AND (”fuzzing” OR ”fuzzer” OR ”testing”). The search was performed using DBLP, which is a widely recognized bibliographic database in the field of computer science, and the results were restricted to papers published in the previously selected venues. This process resulted in an initial collection of 134 papers.
  3. (3)
    Inclusion / exclusion criteria. We manually inspected the papers collected in the previous step and retained those that met our inclusion criteria. Specifically, a paper was included in the survey if it proposed a novel fuzzing method explicitly designed for OS kernel or its subsystems. In contrast, studies that focused solely on bare-metal firmware or addressed fuzzing techniques not specific to kernels, such as those targeting general network protocols, were excluded. After applying these criteria, a total of 99 papers were selected for further analysis.
  4. (4)
    Snowballing. Finally, we performed both forward and backward snowballing to ensure broader coverage and capture relevant studies that may have been missed in the initial search. As a result, a total of 107 papers were identified for inclusion in the survey.

3.2. Survey Result

At the time of submission, our dataset included papers published between 2017 and August 2025, covering publicly available works (including early-access and preprint versions) from top-tier venues. Based on these collected papers, we analyze the current research landscape and trends in OS kernel fuzzing from three perspectives: publication venues and publication years. The statistical results are presented in Figure 2.

Publication venues. The surveyed literature is distributed across 20 different venues. A majority of OS kernel fuzzing papers are published in cyber security venues reflecting the high security risks associated with kernel vulnerabilities. This accounts for 67% of the total, as illustrated in Figure 2(a). Software engineering venues follow, comprising 17%, while system conferences account for 9%. Notably, 28% of the papers were published at the USENIX Security Symposium, making it the most prominent venue in this research area.

Publication years. Figure 2(b) shows the distribution of papers by publication year. There is a clear upward trend in the number of kernel fuzzing publications, indicating the increasing importance of this research area and the growing attention it has received from the academic community. The apparent decline in 2025 is due to the fact that the year is not yet complete, and many papers have not been publicized at the time of this survey.

Refer to caption

(a) Distribution of papers by venues.

Refer to caption

(b) Distribution of papers by years.

Figure 2. Statistics of the collected papers.

4. Overview of Kernel Fuzzing

4.1. Challenges in Kernel Fuzzing

As previously discussed, the OS kernel occupies a unique position in the system architecture, bridging between the user space and the hardware layer. Thus, the complexity and privileged nature of OS kernels introduce distinct challenges in the design and implementation of fuzzing components, distinguishing kernel fuzzing from traditional fuzzing campaigns. To address RQ1, the challenges are categorized as follows, with a side-by-side comparison provided in Table 2:

Table 2. Comparison of challenges in user-space and kernel fuzzing.

4.2. Key Functionalities

According to the survey results, we summarize that OS kernel fuzzing largely adheres to the standard methodology of typical fuzzing but is specifically adapted to address the aforementioned challenges. It involves generating or mutating test cases that interact with kernel interfaces (e.g., syscalls) and executes them in a controlled environment (Google, 2015). During execution, the fuzzer utilizes runtime feedback (typically branch coverage) to steer execution toward unexplored space and increase the likelihood of discovering bugs, while continuously monitoring the target kernel for anomalous behavior such as crashes or hangs (Shi et al., 2019, 2024a). As depicted in Figure 3, this workflow is typically structured around three core stages: environment preparation, input model and fuzzing loop, each playing a critical role in enabling effective fuzzing of OS kernels.

Refer to caption

Figure 3. Key stages and functionalities in kernel fuzzing.

To address the constrained environment challenge, the environment preparation establishes a stable and isolated execution context for the kernel, either through virtualization to enable scalable testing across multiple kernel instances (Maier and Toepfer, 2021; Schumilo et al., 2017; Johnson et al., 2021), or on physical devices to ensure faithful representation of real-world behavior (Qinying et al., 2024; Mera et al., 2024; Song et al., 2019). It also enables coverage tracking, and supports effective detection of potential vulnerabilities (SimonKagstrom, 2010; Li et al., 2024c, 2022). In this regard, three functionalities for developed environment preparation include: execution environment (F1.1), coverage collection (F1.2), and bug oracle (F1.3).

To handle the complexity of kernel input interfaces, an input model must first identify the relevant entry points through which the kernel is exercised (interface identification, F2.1), and then capture the structural and semantic rules governing these interfaces (specification awareness, F2.2), including expected data formats and constraints (Hu et al., 2021; Chen et al., 2020; Corina et al., 2017; Weiteng et al., 2024). Such a model further incorporates the key functionality of dependency recognition (F2.3), enabling fuzzers to generate test cases that are not only syntactically valid but also semantically meaningful.

Building on the prepared environment and the defined input model, the fuzzing loop continuously feeds generated or mutated test cases into the target kernel and leverages diverse feedback metrics specifically tailored to kernel fuzzing. In particular, to address the challenges posed by the kernel’s high degree of statefulness, the fuzzing loop aims to reduce the overhead of state re-initialization and improve execution throughput (Song et al., 2020; Jung et al., 2025; Zheng et al., 2019; Lan et al., 2023). Furthermore, it incorporates state-oriented feedback to strengthen the ability of state exploration (Zhao et al., 2022a; Qinying et al., 2024; Liu et al., 2024b). Overall, a proficient fuzzing loop centers on three functionalities: mutation intelligence (F3.1), execution throughput (F3.2), and feedback mechanism (F3.3).

Refer to caption

Figure 4. Distribution of papers by target stages and functionalities.

We also study surveyed papers based on their primary targeted functionalities, with the results presented in Figure 4. In terms of core components, approximately 32% of the research concentrates on environment preparation for customized kernels. Additionally, 40% of the studies focus on optimizing the fuzzing loop, while 28% aim to improve the correctness and completeness of the input space. As for functionalities, environment preparation (18%) and feedback mechanisms (19%) emerge as prominent areas of interest, both representing aspects that require further research and practical advancement. The distribution of targeted functionalities also closely aligns with the key challenges previously identified (in Section 4.1) in OS kernel fuzzing. Furthermore, it is noteworthy that 59% of the papers specifically target Linux kernel fuzzing. This trend is largely driven by the availability of Syzkaller (Google, 2015), which has established a mature and widely adopted infrastructure, thereby facilitating subsequent fuzzing research and optimization efforts.

Based on the above study, we answer RQ1 as follows. We emphasize that C1, C2 and C3 distinguish OS kernel fuzzing from conventional fuzzing approaches. To address these challenges, we categorize nine key functionalities across three stages, among which execution environment (F1.1), specification awareness (F2.2) and feedback mechanism (F3.3) emerge as the most extensively studied aspects.

4.3. Key Techniques

In response to RQ2, we examine how existing techniques implement and advance the key functionalities identified above. Due to the aforementioned challenges, these studies have inspired a diverse range of technical innovations. These techniques can be organized into three categories, aligned with the stages of our model:

Together, these techniques provide complementary solutions to improve efficiency, coverage, and bug-finding capability. In the following sections, we analyze each functionality in depth and derive a series of implications that summarize the strengths, limitations, and directions of these techniques. Each implication is grounded in the corresponding functionality discussed above (addressing RQ2) and extends the analysis toward unresolved challenges and future research opportunities (addressing RQ3).

5. Environment Preparation

5.1. Execution Environment

Two primary approaches provide the execution environment for OS kernel fuzzing: on-device fuzzing (Song et al., 2019; Li et al., 2022; Qinying et al., 2024; Eisele et al., 2023) and emulation-based fuzzing (Song et al., 2020; Keil and Kolbitsch, 2007; Talebi et al., 2018; Pan et al., 2017; Schumilo et al., 2014; Renzelmann et al., 2012; Maier et al., 2019). We present a summary of the literature and the corresponding solutions employed within the environment preparation in Table 4.

5.1.1. On-device Fuzzing

An on-device fuzzer ensures the target kernel’s continuous and stable operation due to C1 by executing it on actual devices. It employs a user-space application or debugging system for coverage collection and bug detection, connecting directly to the kernel. This fuzzer effectively identifies defects related to unique hardware properties or configurations.

Local fuzzer. This type of fuzzer runs in the user space on a local machine and utilizes the exposed kernel interface to fuzz the kernel (Schumilo et al., 2021), as illustrated in Figure 5. It has limited ability to control and monitor the target kernel because of its low privilege. Even worse, it loses all execution information when the kernel crashes.

Remote fuzzer. The remote fuzzer connects to the target machine that is loaded with a kernel via serial ports (Song et al., 2019; Qinying et al., 2024; Eisele et al., 2023; Li et al., 2022) or network (Schiller et al., 2023). This fuzzer requires a debugging system or probing module to be deployed in the target kernel, and the debug feature is utilized to control the target firmware. For instance, SyzTrust (Qinying et al., 2024) and μ\muAFL (Li et al., 2022) utilize ARM Coresight architecture to control the execution of embedded OSes, and PeriFuzz (Song et al., 2019) designed their probing framework to manage the hardware boundary of a kernel.

Most of these approaches are open source and capable of supporting closed-source OSes. However, they often require intrusive control over OS execution. While the native execution environment of on-device fuzzers provides high stability and fidelity, it is limited in capacity and input execution speed when applied to RTOS on resource-constrained devices, such as ARM Cortex-M chips with clock speeds between 10 MHz and 600 MHz. Furthermore, on-device fuzzing is typically OS-specific and necessitates real, debug-enabled devices or elevated privileges for the fuzzer, which results in increased costs.

Refer to caption

(a) Local fuzzer

Refer to caption

(b) Remote fuzzer

Figure 5. Two typical on-device fuzzers.

5.1.2. Emulation-based Fuzzing

Loading the kernel into a virtual environment offers a more scalable approach with complete control for fuzzing OS kernels, providing a costless and effective solution for kernel introspection. However, the challenge lies in maintaining stability and fidelity to ensure that the emulated kernel operates consistently and without interruption. According to the previous work (Fasano et al., 2021), there are two principal ways to construct the virtual environment: hardware emulation system and rehosted embedded system. Regarding the rehosted embedded system, we further categorize these rehosting techniques into hardware-in-the-loop, high-level emulation, MMIO modeling.

Full emulation-based fuzzing. Full hardware emulation replicates the functionalities of specific hardware accurately, allowing unmodified kernel execution and fuzzing when peripherals are adequately emulated (Fasano et al., 2021). Full emulation aims to implement as many peripherals as possible, providing relatively high stability and fidelity compared to rehosting. When the target emulator is open-source, developers have full control over both the emulator and the running kernel inside it, thereby maximizing the capability for introspection and analysis. Although there are three major emulators for OS kernels, including VMWare (Walters, 1999), VirtualBox (Khan et al., 2022), and QEMU (Bellard, 2005), full emulation-based fuzzing predominantly utilizes QEMU. This preference is due to its effectiveness in handling general-purpose OS and embedded Linux environments. QEMU is particularly advantageous because it is open source and compatible with a range of fuzzing tools (Google, 2015; Pandey et al., 2019; Schumilo et al., 2017), making it the preferred choice. Regarding RTOS and TEE, QEMU offers only limited support. Consequently, existing fuzzers designed for these specialized kernels often rely on rehosting techniques. Since these kernels interact closely with the hardware or specialized architectures, achieving full emulation requires significant effort, making rehosting a more feasible approach for fuzzing in such environments. The speed of kernel operation and fuzzing depends on the machine running the emulator and whether any acceleration techniques are deployed. For instance, emulating a kernel on a machine with similar performance may be slower due to instruction translation overhead, while emulating low-performance kernels on high-performance machines can improve speed.

Rehosting-based fuzzing. While full emulation-based fuzzing is primarily designed for general-purpose OSes and their variants, rehosting-based fuzzing offers a complementary approach for RTOS such as Amazon FreeRTOS (fre, 2003), ARM Mbed (ARM, 2013), Zephyr (zep, 2016), and LiteOS (Cao et al., 2008). Unlike full emulation, a rehosted embedded system focuses on modeling only the essential features of target kernels required for fuzzing or dynamic analysis. This approach also provides full control over the target kernel and a comprehensive introspection. Based on our survey of state-of-the-art rehosting techniques, we identified three primary strategies: hardware-in-the-loop (Koscher et al., 2015; Talebi et al., 2018), high-level hooking (Clements et al., 2020; Feng et al., 2020; Jiang et al., 2021; Li et al., 2021), and MMIO modeling (Gustafson et al., 2019; Harrison et al., 2020a; Zhou et al., 2021; Cao et al., 2020; Johnson et al., 2021). Hardware-in-the-loop, while useful, suffers from lower stability due to potential delays in forwarding hardware data, which can cause crashes during the operation and fuzzing of RTOS. Additionally, the speed bottleneck in this approach is tied to the execution speed of the hardware itself. The latter, MMIO modeling and high-level hooking, face significant challenges related to fidelity and stability, particularly in accurately simulating hardware behavior, such as DMA and interrupt emulation, and handling complex peripherals (Mera et al., 2021). When fuzzing a kernel within a rehosted system, crashes may occur due to the absence of certain feature models. Similarly to full emulation, the speed of kernel operation and fuzzing depends on the machine running the emulator and used acceleration techniques.

Implication 1: OS-Agnostic Rehosting.Current kernel fuzzing environments struggle to balance stability, overhead, and introspection but face significant challenges in achieving OS-agnostic execution, particularly for RTOS and TEE. Fidelity remains a critical limitation, as fuzzers often fail or get stuck due to incomplete or inaccurate emulation. This highlights the need for more faithful rehosting techniques that can generalize across diverse OS platforms. Promising directions include lightweight full-system emulation tailored for specialized environments and improved rehosting frameworks that enhance stability and fidelity while minimizing engineering effort.

5.2. Coverage Collection

Coverage is a key indicator for evaluating the fuzzing effectiveness. To collect coverage, there are two principal ways: invasive instrumentation and non-invasive tracing.

Table 3. Coverage collection methodologies and their applications

5.2.1. Invasive Instrumentation

In essence, invasive instrumentation modifies target code during compilation or runtime, and exposes interfaces for tracking executed portions. Table 3 presents a detailed comparison of coverage collection techniques and their use cases.

Source-based instrumentation.Instrumenting the kernel during compilation is the most effective and intuitive method for coverage collection. For example, KCOV enhances fuzzing by injecting signals into basic blocks, significantly improving bug discovery through code coverage (SimonKagstrom, 2010; Google, 2015; Pailoor et al., 2018; Wang et al., 2021b; Sun et al., 2021; Jeong et al., 2019; Xu et al., 2024). However, this approach primarily targets bugs reachable via syscall inputs and has limitations in non-deterministic areas and non-syscall handlers. Solutions like PeriScope (Song et al., 2019) and USBfuzz (Peng and Payer, 2020) have advanced coverage by focusing on fine-grained and remote collection methods, enabling more effective fuzzing in areas like driver operations and interrupts.

Dynamic instrumentation.When source code is unavailable, dynamic instrumentation provides an alternative for modifying a kernel without requiring access to its source. This approach typically involves an emulator that translates raw machine instructions into an intermediate representation. During this translation, instrumentation code is inserted to enable coverage collection (Bellard, 2005). A key prerequisite is that the kernel must be booted within the emulator. Hence, dynamic instrumentation is commonly integrated with emulation tools such as AFL-QEMU (Zalewski, 2013), enabling analysis of binary-only kernels for Linux (Clements et al., 2020; Maier et al., 2019; Tay et al., 2023; Angelakopoulos et al., 2024), RTOS (Mera et al., 2021), and TEE (Harrison et al., 2020b).

Binary rewriting.For binary only kernels, binary rewriting (Nagy et al., 2021; Zhang et al., 2021; Dinesh et al., 2020) provides a viable alternative. Adapting binary rewriting for kernel fuzzing presents additional hurdles, such as significant overhead that reduces fuzzing efficiency (Maier and Toepfer, 2021) and the complexity and ambiguities associated with static methods (Dinesh et al., 2020; Zhang et al., 2021). Nonetheless, recent innovations (Yin et al., 2023) for macOS kernel extensions show promise, achieving cost-effective static binary rewriting by leveraging macOS’s features for efficient coverage instruction injection.

5.2.2. Non-invasive Tracing

As a result of C1, instrumentation is often restricted because the kernel runs in privileged mode. The strong coupling with hardware leads to a lack of middleware support, thereby limiting observability. In such cases, fuzzers rely on limited interfaces, typically using debug checkpoints or hardware assistance.

Debug checkpoint.Similar to binary rewriting, tracing coverage through debug checkpoints involves setting checkpoints within the target and invoking them to gather feedback during execution. The key differences lie in their scalability and whether they leverage built-in kernel features. Tracing coverage via debug checkpoints is a target-specific approach that depends heavily on the target kernel’s support. For example, SyzGen (Chen et al., 2021) uses macOS debugging tools to address challenges posed by closed-source kernels. Another example is on-device fuzzers that utilize hardware-based debugging, where feedback collection is closely tied to hardware features, such as embedded system debug units (Eisele et al., 2023).

Hardware assistance.Hardware components like CPU have direct access to kernel space and every instruction, facilitating the acquisition of detailed feedback. Techniques like Intel PT and ARM ETM are widely used for capturing execution information, offering two key advantages. First, they ensure greater completeness and robustness in coverage collection. In contrast, user-space solutions are inherently limited, as they can only begin after the kernel or drivers have initialized and therefore miss the coverage generated during the bootstrap phase (Zhao et al., 2022b; Li et al., 2022; Hao et al., 2022). Second, these tools provide execution information from arbitrary OS code including closed-source ones, serving as OS-agnostic feedback mechanisms that support kernel fuzzing across multiple platforms (Schumilo et al., 2017; Aschermann et al., 2019; bug, 2020; Qinying et al., 2024). However, despite these hardware-based advantages, hardware-assisted approaches are limited to architecture-specific targets, similar to debug checkpoints.

Implication 2: Coverage Disparity.Source-based instrumentation has become the dominant approach in kernel fuzzing, adopted by approximately 77% of greybox fuzzers, owing to its flexibility and ability to collect fine-grained, customizable feedback (as detailed in Section 7.3). In contrast, binary-only kernels predominantly rely on OS- or architecture-specific techniques, 95% of which yield only coarse coverage metrics. This underscores the need for techniques that can deliver richer coverage feedback without requiring kernel source access.

Table 4. Summary of kernel fuzzers and their solutions used in environment preparation.

5.3. Bug Oracle

Before starting fuzzing, it is necessary to design a bug oracle to detect bugs. Generally, these bug oracles target two primary types of issues: memory corruption and non-crash bugs. In addition, we discuss solutions for bug triage, focusing on strategies to effectively manage crashes.

5.3.1. Oracle for Memory Corruption

We categorize bug oracles for memory corruption into fatal signals and sanitizers.

Fatal Signal.Fatal signals are among the most widely used and intuitive bug oracles for detecting critical errors, such as illegal memory access. These signals manifest in various forms, including segmentation faults (Schumilo et al., 2017; Pustogarov et al., 2020; Angelakopoulos et al., 2023; Busch et al., 2023), general protection faults (Google, 2015; Gong et al., 2025; Harrison et al., 2020b), kernel panics (Liu et al., 2021b; Yin et al., 2023; Tay et al., 2023; Aafer et al., 2021), and task hangs (Zheng et al., 2022). Additionally, kernel-level exception handling mechanisms (Hao et al., 2022; Qinying et al., 2024) are also used to indicate abnormal behavior. In on-device or hardware-in-the-loop testing scenarios, where direct inspection of execution output or heavyweight analysis is challenging, timeouts are often employed as a practical proxy (Li et al., 2022; Zheng et al., 2022; Qinying et al., 2024). However, relying solely on fatal signals is insufficient, as they may fail to detect logical or silent bugs that do not immediately disrupt system execution.

Sanitizer. Sanitizers have been the de facto oracles widely used by almost all kernel fuzzers. They detect bugs by instrumenting code and monitoring runtime behavior, with each type of bug requiring specific sanitizers. The research community has developed several kernel sanitizers, addressing vulnerabilities like use-after-free / out-of-bounds (kas, 2015), data races (kcs, 2015) and undefined behaviors (ubs, 2015). While effective, these sanitizers have limitations. First, they usually rely on source code instrumentation, which introduces significant overhead and is unsuitable for binary-only systems (Shi et al., 2024b; Dinesh et al., 2020). Therefore, ongoing efforts aim to develop more efficient structures (Jeon et al., 2020) and extend support to closed-source cases (Cho et al., 2023; Pan et al., 2017). For example, BoKASAN (Cho et al., 2023) leverages the Linux kernel’s ftrace feature to insert hooks into critical functions, enabling dynamic instrumentation of binary-only kernels. Additionally, recent research (Sun et al., 2024) points out a fact that sanitizers do not cover the entire kernel, leaving many vulnerabilities undetected. It would be interesting to figure out how far these sanitizers are.

5.3.2. Oracle for Non-crash Bugs

Detecting silent bugs can be challenging since they do not always lead to a crash. It adds barriers to bug discovery. We summarize existing oracle for non-crash bugs into two types.

Differential testing. Differential testing addresses the challenge of detecting silent bugs by executing the same test case across multiple kernel versions or configurations. It is primarily designed to identify correctness issues, based on the assumption that the kernel should exhibit consistent behavior across different environments when given identical inputs. To support this approach, RoboFuzz (Kim and Kim, 2022) detects correctness bugs in the ROS by comparing execution results between simulators and real-world deployments. Similarly, physical discrepancies have been used as indicators of potential bugs in another case (Aafer et al., 2021). The primary challenge hindering the application of differential testing in kernels is the lack of suitable reference counterparts for comparison.

Semantic checkers.Traditional sanitizer usually target low-level memory errors like use-after-free or out-of-bounds accesses. By contrast, semantic bug detection focuses on detecting logic errors or high-level correctness, such as violations of properties and specifications (Lyu et al., 2024; Kim and Kim, 2022; Kim et al., 2019). It is used for ensuring that kernel behavior aligns with expected logical rules and operational semantics. For instance, in filesystem modules, semantic oracles verify if desired states or properties have been violated. Monarch’s (Lyu et al., 2024) detects semantic violations in memory, persistence, fault, and concurrency scenarios by comparing runtime states against symbolic execution based on POSIX specifications and distributed fault models. However, these checkers are often scenario-specific, limiting supported types of vulnerabilities and their general applicability.

5.3.3. Bug Triage.

Beyond detecting crashes via bug oracles, kernel fuzzing critically relies on bug triage to manage the large volume of reports it produces. Triage typically consists of prioritization, minimization, and deduplication (Manès et al., 2021). Compared with user space, prioritization in the kernel is substantially more complex. First, pervasive concurrency and statefulness make many bugs difficult to reproduce, rendering reproduction-based ranking unreliable (Nogikh, 2023). Second, a report that appears “low-risk” may still expose severe consequences due to the diverse interactions within the kernel’s large codebase (Zou et al., 2022). To address these challenges, recent efforts focus on assessing potential impact through proximity-based search (Lin et al., 2022; Zou et al., 2022; Yuan et al., 2023) or exploitability analysis (Zou et al., 2024). Minimization in kernel fuzzing resembles that in user space. For example, Syzkaller reduces syscall sequences while preserving crash triggers (Google, 2015; Xu et al., 2024). Deduplication inherits similar difficulties as prioritization. Besides, stacktrace–based heuristics popular in user space (Lyu et al., 2019) are often insufficient for deduplication of kernel crashes, since memory layout and thread interleavings cause the same underlying bug to manifest with subtly different behaviors (Mu et al., 2022). Despite these advances, existing approaches primarily focus on memory corruption reports and thus heavily rely on information provided by sanitizers (e.g., stack traces and memory addresses). However, their effectiveness in detecting and analyzing non-crash bugs remains largely unexplored.

Implication 3: Beyond Memory Corruption. Enhancing kernel bug oracles requires a balance between detecting a variety of bug types, from memory corruption to more subtle non-crash issues. The detection of memory corruption vulnerabilities is relatively well-developed, with ongoing efforts primarily aimed at improving usability and minimizing overhead. In contrast, identifying semantic-aware bugs presents greater challenges, especially with the emergence of new classes of kernel vulnerabilities (Koschel, 2025; Sun and Su, 2024). Tackling non-crash bugs necessitates sophisticated approaches like differential testing and indicators for logical errors.

6. Input Model

Establishing the input model is a subsequent step after setting up a reliable testing environment. Blind fuzzing is inefficient for navigating the kernel’s complex structure due to the vast input space. Therefore, a systematic approach to defining input synthesis is essential. In this section, we review studies on techniques for specifying desired input, emphasizing the importance of interface identification, specification awareness, and dependency recognition. Table 5 provides a summary of the literature and the corresponding solutions for the input model stage.

6.1. Interface Identification

The OS kernel, serving as the intermediary between user space and hardware, provides a myriad of interfaces. For a fuzzer to automatically and effectively detect vulnerabilities, it must first identify the interfaces that align with its objectives.

6.1.1. Primary Interfaces

There are primarily five types of fuzzing interfaces exposed by OS kernels, consisting of:

Syscall. As the primary interface between user space and kernel space, syscalls serve as the fundamental mechanism for invoking kernel-level functionality. In the latest Linux 6.14 release, the kernel exposes nearly 500 syscalls, not including the numerous variants that are controlled by command parameters (e.g., through the cmd argument). To capture this complexity, Syzkaller defines more than 8,000 syscall descriptions manually using its domain-specific language. These syscalls are vital for OS functionality and provide standardized interfaces for user space to perform a wide range of tasks. As such, they form the principal input surface for kernel fuzzing and are typically exercised in the form of syscall sequences. Given the extensive and evolving syscall interface, along with the limitations of manual specification, considerable research has been devoted to automating the extraction and analysis of syscall semantics and interfaces (Corina et al., 2017; Liu et al., 2020; Hao et al., 2023).

Peripheral devices.Peripheral devices communicate with the kernel through mechanisms such as MMIO (Feng et al., 2020; Mera et al., 2021; Corina et al., 2017; Shen et al., 2022a), Port I/O (Pustogarov et al., 2020; Talebi et al., 2018; Peng and Payer, 2020), DMA (Mera et al., 2021; Song et al., 2019; Ma et al., 2022; Wu et al., 2023b), and interrupts (Zhao et al., 2022b; Shen et al., 2022a; Corina et al., 2017). These channels form a foundational interface for injecting test cases, facilitating the exploration of vulnerabilities within the OS–hardware communication layer. Thorough testing of these interfaces is critical, as they can serve as potential vectors for security breaches. Given the high cost and complexity of interacting with real hardware, practical approaches include device-based techniques (Sönke Huster and Classen, 2024; Talebi et al., 2018; Corina et al., 2017; Song et al., 2019; Mera et al., 2024) and device-free modeling (Shen et al., 2022a; Wu et al., 2023b; Ma et al., 2022; Zhao et al., 2022b).

Filesystem.Filesystems are fundamental components of an OS kernel, crucial for managing user files and maintaining data consistency during system crashes. They are typically structured, complex binary blobs mounted as disk images. Users interact with a mounted filesystem image through a set of file operations (e.g., syscalls). Some studies focus on mutating images as binary inputs (Schumilo et al., 2017; Aschermann et al., 2019), while others limit themselves to generating operations (Google, 2015; Chen et al., 2020).

Network.Network access is a fundamental feature in modern OS kernels, including streamlined or minimalistic ones. It provides a practical interface for testing and analysis, especially in resource-constrained systems such as embedded devices and IoT, where other interfaces may be constrained or unavailable (Schiller et al., 2023; Chen et al., 2018).

Configuration. In highly configurable OS kernels, configuration plays a critical role in shaping the input space. For instance, the default configuration of Linux kernels excludes approximately 80% of code changes from the compiled binary (Yıldıran et al., 2024). Because configuration changes necessitate recompiling the kernel, configuration testing must strike a balance between bootability, code coverage, and build time (Hasanov et al., 2025).

6.1.2. Multi-dimensional Input

Traditional methods typically concentrate on a single primary interface. This approach, while effective, tends to overlook vulnerabilities within the interplay between various interfaces (C2). The need for a more holistic approach has become increasingly apparent (Wu et al., 2023b). As revealed by Janus (Xu et al., 2019), relying solely on one aspect of the filesystem—either the disk image or file operations—inevitably neglects the other. This results in incomplete and less effective testing. Therefore, Janus proposes a two-dimensional strategy that explores input space from both sides. Such a inherent synergy is not only applicable to syscall-filesystem but is also highly relevant in other scenarios, such as syscall-driver (Scharnowski et al., 2023; Jang et al., 2023; Yiru et al., 2024; Ma et al., 2022; Pustogarov et al., 2020). For example, driver fuzzing is another area where a multi-dimensional approach is necessary. Device drivers often interact with various kernel interfaces, including syscalls and peripheral interfaces. Recent work explores USB drivers through record-and-replay techniques (Jang et al., 2023) and host–gadget synergy (Yiru et al., 2024), further highlighting the importance of considering cross-interface behavior. Given the existence of thousands of complex devices and drivers beyond USB, a general methodology to systematically capture and test the wide range of interface interactions in OS kernels is highly desired.

Implication 4: Interface Interactions. The diversity of kernel input interfaces underscores the complexity and breadth of input model in kernel fuzzing. Beyond primary interfaces, studies have shown using one-sided input alone can miss up to 77% of kernel code that would otherwise be explored through cross-interface interactions (Yiru et al., 2024; Xu et al., 2019). Although several efforts have investigated specific cross-interface scenarios, these attempts remain fragmented and domain-specific. The gap underscores the need for fuzzing approaches that explicitly model and exercise interface interactions, rather than treating interfaces in isolation.

6.2. Specification Awareness

Recognizing specification requirements is critical for fuzzing as kernel interfaces expect inputs adhering to defined structures and formats. However, being grammar-aware while generating input is a complex task. First, kernels interact with various devices and software, leading to diverse input formats. Second, beyond syntax, understanding input semantics is also important, as syntactically correct inputs can vary greatly in their effects. Despite the emergence of specification free techniques (Bulekov et al., 2023), specification-based approaches remain the mainstream (Liu et al., 2020; Dawoud and Bugiel, 2021; Busch et al., 2023; Chen et al., 2021; Hao et al., 2023). To address these challenges, researchers have focused on extracting grammar specifications as an integral component of input generation. These methods can generally be categorized into static and dynamic.

6.2.1. Static Inference

Before the advent of automation, developers typically relied on their domain expertise to manually craft syscall specifications (Google, 2015). This manual process was both labor-intensive and error-prone, significantly limiting the scalability of testing across a broader range of syscall targets. These challenges have driven the development of static specification inference techniques, which analyze kernel source code to infer expected input structures and identify constraints without requiring code execution (Zhao et al., 2022b; Wu et al., 2023b; Sun et al., 2025; Liu et al., 2020). For instance, DIFUZE (Corina et al., 2017) pioneered the use of heuristic static analysis to extract syscall specifications from kernel driver source code. Building upon this foundation, SyzDescribe (Hao et al., 2023) enhances both the depth and accuracy of inference by constructing a principled response model of kernel drivers. In certain cases, documentation can also support the analysis by providing supplementary information (Choi et al., 2021). These rule-based approaches, while effective, often require continuous maintenance to accommodate evolving kernel targets and struggle with handling edge cases (Yang et al., 2025). To address these limitations, LLM-based techniques have presented high potential. KernelGPT (Yang et al., 2025) leverages the code-understanding capabilities of LLMs to adaptively extract specifications from source code with minimal reliance on external knowledge. It achieves a 6×\times increase in specification generation while maintaining 93.3% accuracy, substantially outperforming traditional methods. However, LLMs face their own challenges. Their difficulty in capturing implicit semantics arises because such semantics are not explicitly encoded in source code, but instead distributed across control flows, hidden in indirect function calls, or dependent on runtime conditions (e.g., device-specific logic in DRM subsystems). Current LLMs rely heavily on textual and structural cues from the training data, which makes it hard for them to infer these latent dependencies without additional program analysis support. Beyond static structure, recent study (Sun et al., 2025) highlights the influence of runtime parameters exposed via sysfs, underscoring the complexity of the specification inference. Despite the progress of automated approaches, specification generation for subsystems such as eBPF (Hung and Amiri Sani, 2024) and complex protocols (Schiller et al., 2023) remains an open problem. These domains still depend heavily on manual analysis due to the scarcity of explicit specifications, the prevalence of implicit semantics, and the need for runtime context that neither static rules nor LLMs alone can fully resolve.

6.2.2. Dynamic Analysis

In contrast to static inference, dynamic approaches typically adopt an iterative strategy that leverages runtime information collected from executing targets to complement static analysis. Based on the method of data collection, existing dynamic techniques can be broadly categorized into two types: log analysis (Chen et al., 2021; Han and Cha, 2017; Jang et al., 2023; Aafer et al., 2021) and dynamic probing (Sun et al., 2022; Yin et al., 2023; Zhu et al., 2024). Log analysis serves as an indirect strategy, particularly useful for binary-only kernels where direct program inspection is inconvenient. For example, targeting macOS, SyzGen (Chen et al., 2021) starts by generating initial syscall templates through trace analysis and subsequently refines argument types and constraints using symbolic execution. Similar log-based techniques have also been applied to analyze interactions in USB protocols (Peng and Payer, 2020) and the Android ecosystem (Aafer et al., 2021). Dynamic probing has become feasible due to the enhanced observability features integrated into modern operating system kernels. In Linux, eBPF can be repurposed to hook into syscall probes and access the corresponding file operations (Sun et al., 2022). Similarly, macOS provides a kernel extension wrapper that offers comparable functionality and serves as an entry point for inference when combined with taint analysis (Yin et al., 2023). Compared to static inference, dynamic approaches yield more accurate and realistic outputs. However, they may also suffer from false negatives, as certain drivers or components may not be active or accessible during runtime analysis.

Implication 5: Hybrid Generation.Specifications are crucial for grammar-aware fuzzing but remain difficult to generate due to diverse formats and implicit semantics. Static inference offers broad coverage but lacks adaptability. LLM-based methods improve generality by up to 6×\times (Yang et al., 2025) but cannot handle indirect call cases, and dynamic analysis provides realism but overlooks inactive paths. These complementary strengths and weaknesses indicate that no single approach is sufficient. Hybrid solutions combining static, dynamic, and LLM-based techniques are needed to achieve comprehensive and accurate specification generation.

Table 5. Summary of kernel fuzzers and their solutions used in input model.

6.3. Dependency Recognition

One of the most critical characteristics of OS kernels is their statefulness, as discussed in C3. This nature necessitates a coordinated organization of test cases, referred to as explicit / implicit dependency.

Refer to caption

Figure 6. Explicit and implicit dependencies between syscalls.

6.3.1. Explicit Dependency

Explicit dependencies refer to the direct relationships where the output of one syscall directly influences the input of another, such as in resource assignment. In this context, syscalls that generate outputs are identified as producers, while those that consume these outputs are considered consumers. We define a syscall cic_{i} as explicitly dependent on another syscall cjc_{j} when cic_{i} is a consumer and cjc_{j} is a producer. If open is not executed or fails, subsequent syscalls like mmap cannot execute successfully. Beyond return values, syscalls can also accept parameters derived from other syscalls. Some studies have sought to identify explicit dependencies among syscalls through methods such as trace inference (Han and Cha, 2017; Chen et al., 2021; Weiteng et al., 2024), layered model building (Chen et al., 2020) or producer-consumer analysis (Google, 2015; Pailoor et al., 2018; Sun et al., 2021; Xu et al., 2024). For example, IMF (Han and Cha, 2017) records the input and output values of hooked syscalls and then applies heuristic inference to logs, focusing on the order and value of entries. In the realm of producer-consumer analysis, fuzzers like Syzkaller perform type-based analysis on specifications, assigning higher priority to producer-consumer pairs.

6.3.2. Implicit Dependency

Implicit dependency, in contrast, is more subtle and mandates a sequence of syscalls without involving explicit producer-consumer relationships. It stems from the kernel’s inherently stateful nature, characterized by extensive shared data structures and resources that can be accessed through multiple system calls. For instance, memory operations like mlockall and msync have no relevance in parameters or return value, but they operate on shared variables implicitly (shown in Figure 6), thereby creating implicit dependencies (Pailoor et al., 2018). These dependencies are challenging to identify because they are often obscured within the vast and complex kernel codebase. Researchers currently use static analysis, dynamic analysis, and a combination of both to uncover these dependencies. Several studies have proposed static analysis techniques for identifying potential dependency pairs in kernel code, particularly when syscalls operate on shared global variables (Pailoor et al., 2018; Kim et al., 2020; Fleischer et al., 2023). Although these methods can be informative, they are prone to generating false positives (Jeong et al., 2019). More recent research has advanced the field through dynamic dependency recognition, which offers more reliable results. For instance, HEALER (Sun et al., 2021) and MOCK (Xu et al., 2024) infer runtime dependencies by minimizing coverage and employ context-free and context-aware models, respectively Furthermore, refcount has been explored as an additional means of representing implicit dependencies for mutation guidance (Bai et al., 2024b). Despite these advancements, the aforementioned methods are seldom applicable to closed-source targets. In fact, heuristic-based approaches remain dominant due to their practicality and usability.

Implication 6: Dependency Integration. Prior works have explored capturing or modeling explicit and implicit dependencies, yet no consensus has been reached among researchers. Ideally, these dependencies should be incorporated into specifications like Syzlang, which currently lacks support for them. Extending Syzlang to integrate these dependencies presents a promising direction.

7. Fuzzing Loop

Once the environment is configured and inputs defined, fuzzers initiate the fuzzing process. Traditional methods face kernel-specific challenges such as statefulness and concurrency. Functionalities required at this stage include execution throughput, mutation intelligence, and feedback mechanisms. Table 6 provides a summary of the literature and the corresponding solutions employed within the fuzzing loop.

7.1. Execution Throughput

Execution speed has a significant impact on performance of both user- and kernel-space fuzzing. However, the emulation-based architectures commonly employed in kernel fuzzing, together with its statefulness, necessitate solutions that differ substantially from those used in user space. Note that the techniques discussed here are developed based on their native environments and do not alter execution functionality, distinguishing them from those in F1.1.

7.1.1. Virtualization Enhancement

The introduction of a VM layer in kernel fuzzing enables improvements in execution throughput through enhanced virtualization efficiency.

Accelerated virtualization. Virtualization acceleration techniques (Yu et al., 2020) have been widely studied in the community. Existing fuzzing methods enable high performance virtualization through hardware assistance (Schumilo et al., 2017) and user-mode emulation (Zheng et al., 2019). However, these approaches are generally architecture- and fuzzer-specific, and thus limit their application.

Efficient synchronization. As the memory space of host and guest VMs is mutually isolated, their communication incurs significant overheads. For example, Syzkaller runs the fuzzer and executor inside the VM and synchronizes the state via RPC. Subsequent works mitigate the problem by proposing more efficient synchronization mechanisms, such as shared memory (Sun et al., 2021; Lan et al., 2023; Liu et al., 2023) and data transfer (Liu et al., 2023).

7.1.2. System Snapshot

The accumulated internal states may corrupt the kernel or interfere with subsequent executions. Hence, it is time-consuming but necessary to reboot the system regularly. The snapshot techniques save time and increase throughput by taking proper system snapshots and restoring them when necessary. The typical practice is to fork an initialized VM as a new instance (Pandey et al., 2019; Google, 2015).

Lightweight snapshot. The native QEMU snapshot dumps all the CPU registers and the memory space and thus poses into files. Nevertheless, such a faithful snapshot may pose non-negligible overheads. A lightweight snapshot tailored to fuzzing is heavily desired. To achieve this, existing methods selectively restore memory pages on a Copy-on-Write principle (Zheng et al., 2019), or customize the snapshot function upon QEMU/KVM for fuzzing adaptation (Schumilo et al., 2021; Bulekov et al., 2023; Gong et al., 2021). Since this process operates at the emulation level, it can benefit multiple OS kernels that are virtualizable.

Checkpoint policy. It is typical of fuzzers to take a startup snapshot and restore it when necessary. However, the input executions undergo several similar phases besides startup. Hence, by properly creating continuous checkpoints (Song et al., 2020; Jung et al., 2025; Yuan et al., 2023), fuzzers can skip repeated steps and have direct access to the state that is established by time-consuming operations.

Implication 7: Throughput Gain.Studies have shown that secondary operations, such as data transfer, can account for up to 54% of fuzzing time (Lan et al., 2023; Liu et al., 2023), significantly degrading overall performance. Consequently, optimizing virtualization for enhanced kernel fuzzer interaction and improving the bootstrap process for rapid recovery is key to increasing throughput. Future directions should aim at developing more universally applicable virtualization enhancements, creating lightweight snapshot techniques for fuzzing, and devising effective checkpoint policies to minimize redundant operations.

7.2. Mutation Intelligence

Although the input model reduces the search space, blind fuzzing still struggles to find bugs due to the complexity and micro-level variations in test cases. To address this, existing strategies are structured around three key phases: constraint solving, thread scheduling and decision intelligence.

7.2.1. Constraint Solving

While random fuzzing excels under lenient conditions, it struggles with stringent branch constraints, such as magic bytes and checksums, requiring extensive efforts to meet specific conditions. Integrating symbolic execution (Yun et al., 2018) with fuzzing has significantly boosted the ability to tackle complex constraints in user-space fuzzing, a strategy equally beneficial for kernel fuzzing. Hybrid approaches combining symbolic execution and fuzzing have been applied in kernel environments for interface recovery and value inference (Chen et al., 2021; Sun et al., 2022; Zhao et al., 2022a; Hao et al., 2022), although scaling these methods for real-time use in complex kernels presents challenges, including indirect control transfers and path explosion (Kim et al., 2020). Solutions specifically designed for kernel fuzzing aim to overcome these obstacles through indirect control flow transformation (Kim et al., 2020) and selective strategies (Chen et al., 2022; Aschermann et al., 2019). Besides, we also call for we emphasize the need to enhance the accessibility of the field, particularly given the current lack of available dynamic constraint-solving solutions.

7.2.2. Thread Scheduling

Concurrency-related kernel vulnerabilities emerge from the inherent complexity and unpredictability of non-deterministic kernel scheduling (Pabla, 2009; Fonseca et al., 2014). Detecting these vulnerabilities requires careful consideration of both test inputs and specific thread interleavings, making precise control over threads essential for identifying concurrency issues. Rather than modifying the kernel scheduler—a process that is both labor-intensive and risky—the prevalent alternative is to control threads using delay injection (Yuan et al., 2023; Jiang et al., 2022; Xu et al., 2020) or using hypervisor-level control (Jeong et al., 2024, 2019). Introducing delays between threads provides a straightforward means to influence thread scheduling (Yuan et al., 2023; Jiang et al., 2022; Xu et al., 2020). However, this approach offers limited control over complex thread interactions and often results in significant performance degradation due to the added latency. To perform preemption, the hypervisor-based approaches either utilize hardware breakpoints as scheduling points and trap the thread in infinite loops (Jeong et al., 2019, 2023), or suspend thread execution on vCPU (Gong et al., 2021, 2023). Despite their effectiveness in enforcing thread control, such methods typically demand extensive modifications to both the hypervisor and the kernel. Consequently, they also suffer from high overhead and limit their scalability and usability severely (Xu et al., 2025b). In addition to the aforementioned approaches, research regarding RTOS has explored exposing concurrency bugs through task priority manipulation (Shen et al., 2021). More recently, SECT (Xu et al., 2025b) introduced a novel direction for thread scheduling. Rather than relying on external mechanisms, SECT targets the scheduling source itself by implementing a custom scheduler. This scheduler leverages the Linux eBPF feature and thus enables lightweight and flexible thread control.

7.2.3. Decision Intelligence

To maximize bug discovery under limited computational resources, fuzzers must make intelligent decisions when selecting seeds and mutation operators for each iteration. This phase represents the area of greatest similarity between user-space and kernel-space fuzzing. It often involves the use of runtime feedback (e.g., branch coverage) and optimization techniques such as reinforcement learning (Wang et al., 2021a; Zhang et al., 2022b; Yue et al., 2020), information entropy (Böhme et al., 2020), simulated annealing (Zhang et al., 2022a; Li et al., 2025), and particle swarm optimization (Lyu et al., 2019). These strategies are employed by fuzzers to support a range of testing objectives, including coverage maximization (Wang et al., 2021b; Xu et al., 2024; Gong et al., 2025), concurrency analysis (Gong et al., 2023), vulnerability pattern detection (Lee et al., 2024), and reachability exploration (Zhang et al., 2022a; Li et al., 2024a). For instance, SyzVegas (Wang et al., 2021b) utilizes a reinforcement learning algorithm to prioritize seeds and mutation strategies that yield higher coverage rewards. Similarly, SyzRisk (Lee et al., 2024) intends to allocate more fuzzing energy to inputs with higher probabilities of matching known vulnerability patterns. Although the challenges of decision intelligence are similar in both user space and kernel space, algorithms for the kernel domain have not been studied as extensively as their user-space counterparts . It would be an interesting topic to investigate the effectiveness of user-space scheduling algorithms in the context of kernel fuzzing and explore novel strategies specifically tailored to unique characteristics of kernel environments.

Implication 8: Native-feature Intelligence.While existing strategies have made notable progress, their intrusive nature and high overhead have constrained their practical applicability. These limitations make them unsuitable as a robust foundation for addressing challenges such as constraint solving and thread scheduling. Recent studies suggest that leveraging native kernel features offers a promising direction (e.g., achieving 11.4×\times speed-up in thread scheduling (Xu et al., 2025b)), toward developing lightweight, scalable solutions that enhance mutation intelligence. Realizing this potential will require closer collaboration between the community and researchers.

Table 6. Summary of kernel fuzzers and their solutions used in fuzzing loop.

7.3. Feedback Mechanism

As discussed in Section 5.2, existing kernel fuzzing proposals have improved feedback acquisition using both invasive instrumentation and non-invasive tracing. Establishing an efficient feedback mechanism for OS kernel fuzzing requires defining clear testing goals and proper fitness metrics.

7.3.1. Testing Goals

Fuzzers discover vulnerabilities with primary testing objectives, expanding code coverage (coverage-guided fuzzing) and prioritizing specific code locations (directed fuzzing).

Coverage-guided fuzzing. Kernel fuzzers typically employ a coverage-centric strategy, aiming to maximize execution path diversity—a signal that has proven effective for uncovering bugs (Wang et al., 2019). These fuzzers assess input quality using fitness metrics, with basic block coverage (Corina et al., 2017) and branch coverage (Shi et al., 2019; Jeong et al., 2024) being the most commonly used. Additionally, state- and concurrency-oriented metrics have been developed to address kernel-specific characteristics, which will be discussed in more detail later. Due to the inherent complexity of operating system kernels, effective fuzzing requires a multi-dimensional feedback mechanism that integrates control flow analysis, data flow tracking, state exploration, and concurrency probing.

Directed fuzzing. Directed greybox fuzzing (DGF) have exhibited potential in tasks like patch testing (Lee et al., 2021; Böhme et al., 2017) and crash reproduction (Lin et al., 2022; Zou et al., 2022). Unlike coverage-centric fuzzing, DGF prioritizes seeds closer to specific target points, either manually set or indicated by sanitizers. However, its adoption in the kernel has been limited due to unique challenges of kernels. As mentioned in C2, one of the challenges arises from the testing interface. The input of DGF in user space are continuous and can be arbitrarily mutated. In contrast, kernel-space DGF inputs consist of a discrete set of syscalls, from which a relevant subset that actually reaches the target location must be identified. To address the challenge, existing work conducts static analysis from target code locations to assess reachability. Some of these efforts focus on the specific parameter values required by syscalls (Tan et al., 2023; Shi et al., 2024a; You et al., 2017), while others emphasize optimizations tailored for particular subsystems (Li et al., 2024a). Due to the large-scale codebases, a further challenge specific to kernel fuzzing is the high computational overhead and accuracy degradation associated with distance calculations. For instance, the state-of-theart DGF method in user space, AFLGo (Böhme et al., 2017), takes more than 16 hours to calculate the static distance for gVisor (Google, 2022) on average (Li et al., 2024a). In response, recent fuzzers have introduced various mitigation strategies, including enhanced distance metrics (Li et al., 2024a; Yuan et al., 2023), key object filtering (Lin et al., 2022; Zou et al., 2022), and improved exploration–exploitation balancing techniques (Li et al., 2024a; Zhang et al., 2022a). For example, G-Fuzz (Li et al., 2024a) proposes a lightweight, directed fuzzing framework for gVisor. Its primary contribution lies in a fine-grained and efficient mechanism that substantially reduces the cost of distance computation while improving the handling of indirect calls through more precise analysis. These efforts highlight a blueprint for integrating DGF with various security tasks, with further applications (e.g., impact and exploitation accessment (Zou et al., 2022; Lin et al., 2022)) in kernel security remaining a less unexplored topic.

7.3.2. Diverse Fitness

Classic code coverage lacks sensitivity to complex kernel conditions such as statefulness and thread interleaving. Most fuzzers focus on diverse fitness metrics beyond classic code coverage to approximate the kernel under test comprehensively. These metrics, like block or edge coverage, guide fuzzers towards desired aspects of the target kernel, including state, concurrency. In response to RQ2, Table 7 presents a detailed comparison of fitness metrics and their use cases.

State-oriented fitness. As previously noted in C3, the high degree of statefulness constitutes a significant challenge for OS kernel fuzzing. Kernel state encompasses the execution context, including occupied resources like registers and variables, distinct from user-space programs. OS kernels retain their values over time, accumulating internal states. This stateful nature sets kernel fuzzing apart from application fuzzing, requiring specific states to trigger vulnerabilities (Zhao et al., 2022a; Liu et al., 2024b). Effective fuzzers navigate this complexity, targeting diverse and deep states. While some works (Pailoor et al., 2018; Chen et al., 2021; Sun et al., 2021; Busch et al., 2023; Xu et al., 2024) have examined states indirectly, a systematic approach with state-oriented fitness is still required to explore the space. Based on the type of state modeling, existing work can be divided into state-machine-based and state-variable-based. The state-machine-based approaches aim to identify concrete state machines embedded within the kernel under test. For instance, USB gadget stacks commonly implement various finite state machines (FSMs). To leverage this structure, FuzzUSB (Kim et al., 2022) combines static analysis with symbolic execution to detect state transitions in USB drivers. Based on these transitions, it infers potential state machines and retains inputs that trigger previously unexplored states. However, the state-machine-based approaches typically impose strict structural assumptions on the target system, and in many cases, explicit FSMs may not be present. In contrast, state-variable-based approaches offer a more practical alternative, as they do not rely on the existence of well-defined FSMs and can adapt to a broader range of system behaviors. The central idea behind this approach is to approximate kernel states using critical variables and monitor their value changes as an indicator of state coverage (Zhao et al., 2022a; Liu et al., 2024b; Qinying et al., 2024). For example, StateFuzz (Zhao et al., 2022a) employs static analysis to identify candidate state variables, focusing specifically on those that can be traced back to global variables and are accessed by multiple operations. Unlike traditional coverage-guided fuzzers, StateFuzz retains an input if it triggers a new value range or an extreme value for a recognized state variable. Similarly, SyzTrust (Qinying et al., 2024) identifies relevant state variables by summarizing critical structures associated with TEEs. Although these methods have shown success, their reliance on static analysis or heuristic modeling can lead to false positives and compromise state integrity. Open questions remain regarding the efficiency and soundness of state approximation.

Table 7. Diverse feedback fitness and its applications

Concurrency-oriented fitness. The widespread use of parallelization in OS kernels leads to a rise in concurrency bugs, like data races and deadlocks. Detecting these issues remains particularly challenging due to their inherently non-deterministic nature. While some techniques have been proposed to make thread scheduling more controllable (Gong et al., 2021; Xu et al., 2025b; Jeong et al., 2019), they typically operate without feedback from execution and thus manage thread scheduling in a largely uninformed manner. Traditional code coverage metrics fail to capture unique behaviors resulting from thread interleavings. It is evident that concurrency-oriented fitness is highly desired to facilitate a systematic exploration.

To effectively characterize the concurrency space, existing studies have defined metrics across multiple levels of granularity, including function-level, instruction-level, and segment-level (Xu et al., 2020; Yuan et al., 2023; Jiang et al., 2022; Jeong et al., 2023). The function-level metric represents the coarsest granularity for analyzing concurrency, with Conzzer (Jiang et al., 2022) serving as a representative example. The core intuition behind Conzzer is that if a function f​u​n​cafunc_{a} executes concurrently with another function f​u​n​cbfunc_{b}, then both the caller and callee of f​u​n​cafunc_{a} are also likely to execute concurrently with f​u​n​cbfunc_{b}. Based on this insight, Conzzer introduces a concurrent call pair metric to capture and describe combinations of potential concurrent functions. Differently, Krace (Xu et al., 2020) operates at the instruction level and proposes a fine-grained metric known as alias coverage. This metric captures the memory locations accessed by concurrently executed instructions. While it offers detailed insight into low-level interactions, it does not fully capture the semantic characteristics of concurrency bugs. Notably, both Conzzer and Krace primarily focus on kernel filesystems, and their effectiveness in scaling to the broader kernel space remains an open question. To bridge the gap between function-level and instruction-level analysis, SegFuzz (Jeong et al., 2023) presents an intermediate metric called segment. SegFuzz defines segment as a group of instructions that access shared memory objects. It strikes a balance between granularity and semantic relevance, offering a more practical representation of concurrent behavior. Despite these advancements, fuzzers targeting concurrency-oriented feedback continue to face significant challenges, particularly in terms of limited flexibility and substantial performance overhead (Xu et al., 2025b).

Implication 9: Multi-feedback Prioritization.Existing OS kernel fuzzers mainly optimize the feedback mechanism based on the characteristics of target components. Exploring more targeted fitness metrics to uncover specific types of vulnerabilities is a valuable direction. However, as the use of multiple fitness metrics increases, the prioritization of feedback in the context of multi-feedback fuzzing has not yet been thoroughly studied, despite being a critical factor influencing the testing efficiency (Wang et al., 2021b; Xu et al., 2024).

8. Challenges and Opportunities

We have outlined potential directions of existing techniques in the preceding sections. To answer RQ3, in this section, we present a more detailed exploration of future directions that could enhance specific aspects of the fuzzing process and further improve kernel security.

Interactive driver fuzzing. As emphasized in implication 4, the attack surface of kernel drivers arises from both user space and peripherals. User-space programs interact with drivers via the syscall interface, such as ioctl, while devices connect with drivers through the peripheral interface. Both interfaces significantly impact the functionality of the drivers. Prior work (Hao et al., 2022) has demonstrated that some dependencies cannot be resolved without efforts from both sides. Nevertheless, existing works primarily concentrate on either user space or peripheral interactions when testing drivers, often feeding inputs from a single source. While earlier bugs have been systematically mitigated (Wu et al., 2023b), the intricate internal states arising from the interactions between these two interfaces have received comparatively little attention, resulting in numerous vulnerabilities remaining unresolved. Recent studies (Jang et al., 2023; Yiru et al., 2024) have taken one step forward in the direction, although their approaches are limited to specific and can not scale. A potential solution is to develop a chronological driver model that focuses on code affected by both interfaces and to create a dual-interface fuzzing framework that simultaneously analyzes interactions from user space and peripherals while monitoring state changes.

Harnessing scheduler for concurrency.Kernel concurrency vulnerabilities are inherently more challenging to uncover compared to sequential ones due to the unpredictable nature of kernel scheduling (Fonseca et al., 2014; Gong et al., 2021). Despite advancements brought by various fuzzing techniques in F3.2 and F3.3, these methods often necessitate significant modifications on kernels or emulators. These invasive customizations significantly hamper scalability and impose substantial performance overhead (Jeong et al., 2024, 2023; Gong et al., 2021). Recently, the introduction of the sched_ext (Linux, 2024) feature has opened new avenues for addressing this issue. Originally designed to enable flexible and extensible scheduler logic, sched_ext allows developers to modify scheduling behavior using pluggable eBPF programs (Jia et al., 2023). As discussed in the implication 8, by designing schedulers specifically tailored for concurrency exploration, it becomes possible to precisely control thread interleaving in a customizable manner. At the same time, this approach retains the performance benefits of native execution and ensures forward compatibility (Xu et al., 2025b). Further investigation is needed to explore the combination of sched_ext with concurrency-oriented fitness to improve lightweightness and generality.

In-domain benchmark construction.An in-domain benchmark is essential for fair and accurate evaluation, particularly given the rapid growth of kernel fuzzing techniques. Inspired by benchmarks developed for application fuzzing (Hazimeh et al., 2022; Metzman et al., 2021; Natella and Pham, 2021), an effective benchmark for kernel fuzzing should possess the following attributes: (a) Diversity: The benchmark should encompass a wide variety of bugs distributed across different modules. (b) Verifiability: It should employ reliable and measurable metrics. (c) Evolvability: As the kernel continuously evolves with the introduction of new features, the benchmark must also adapt to reflect the kernel development. Yet, the evaluation of kernel fuzzers is complicated by additional factors, as noted in F1.1 and F2.1. A practical starting point would involve creating a benchmark specifically for Linux kernel, which typically offers superior infrastructure and has a broader impact. One potential approach is to combine with syzbot (Vyukov, [n. d.]). It includes a wide range of real-world bug reports from various types and different modules. These bugs are also accompanied detailed patch history and status, facilitating efficient triage. Additionally, syzbot’s continuous nature inherently supports the benchmark’s evolvability, allowing it to stay aligned with the ongoing development of Linux kernel.

LLM integration. LLMs have shown significant potential across a wide range of tasks (Tian et al., 2025; Li et al., 2024c; Xu and Huang, 2025; Eom et al., 2024; Zhou et al., 2025), owing to their ability to understand and generate both natural language and code. Recent studies have also shed light on LLM integration with fuzzing workflows (Xu et al., 2025a; Xia et al., 2024), including syscall specifications generation (Yang et al., 2025) and dependency modeling (Zhang et al., 2025). While solutions represent an important step toward LLM-assisted fuzzing, significant limitations remain. As revealed in implication 5, a key challenge arises from the prevalence of indirect calls in the kernel (Liu et al., 2024a). Current LLM-based approaches derive specifications by analyzing relevant source code snippets. However, indirect calls are difficult to resolve from source code alone, and such methods fail in cases where critical semantics are embedded in indirect invocations (e.g., in DRM subsystems). At the same time, purely static analysis techniques, although more effective at handling indirect calls, lack the ability to capture textual information. Addressing this issue requires a balanced integration of LLMs with program analysis techniques, rather than relying solely on LLMs.

Impact of Rust. After years of discussion and development, the integration of Rust into the Linux kernel has become a reality. Although Rust code currently comprises only a small fraction of the codebase, plans are already in place for incorporating more extensive Rusty components, especially in drivers (Li et al., 2024b). In light of this growing adoption, it is important to assess the implications for existing kernel fuzzing infrastructure and strategies. One key research direction involves examining the extent to which current frameworks effectively support the discovery of vulnerabilities in Rusty kernel code. For instance, the kernel coverage tool KCOV was originally designed with GCC in mind, and its compatibility and accuracy when applied to Rust code warrant further investigation. Moreover, as the focus of vulnerabilities in Rusty kernel components shifts from traditional memory corruption issues to logic errors (Li et al., 2024d), new categories of bugs may emerge. This evolution underscores the need for enhanced oracles capable of detecting logic vulnerabilities, which remains an open area for future research.

9. Conclusion

In this work, we conduct a systematic study of 107 OS kernel fuzzing papers published between 2017 and August 2025 in top-tier venues. We propose a comprehensive taxonomy of OS kernel fuzzing by introducing a stage-based fuzzing model and defining the desired functionalities at each stage. Leveraging this taxonomy, we analyze how contemporary techniques implement these functionalities, examine the gaps in current approaches, and explore potential solutions. Furthermore, we identify critical challenges faced by existing OS kernel fuzzing methodologies and highlight promising future research directions.

10. Acknowledge

We are grateful to the editors and the anonymous reviewers for their thoughtful feedback and constructive guidance, which significantly improved the quality of this work. This research was partially supported by the National Science Foundation of China (NSFC) under Grant No. 62293511, No. U244120033, U24A20336, 62172243, 62402425 and 62402418, the China Postdoctoral Science Foundation under No. 2024M762829, the Zhejiang Provincial Natural Science Foundation under No. LD24F020002, the ”Pioneer and Leading Goose” R&D Program of Zhejiang under No. 2025C02033 and 2025C01082, and the Zhejiang Provincial Priority-Funded Postdoctoral Research Project under No. ZJ2024001.

References