GitHub - FudanSELab/LinuxFLBench (original) (raw)

This repository contains the code and data for the paper "LinuxFLBench: Benchmarking and Enhancing LLM-based Agents in Localizing Linux Kernel Bugs".

Dataset Introduction

LINUXFLBENCH is a new benchmark of 250 Fault Localization tasks derived from real-world Linux kernel bugs.

Methods and Code Structure

The main code is under the code/ directory, organized as follows:

Typical Workflow

  1. Candidate Expansion
    Use scripts in scale/ to expand candidate file lists for each bug (e.g., Directory-Aware Expansion, Potential Cause Expansion).
  2. Candidate Integration
    Use scripts in merge/ to fuse multiple candidate ranking results, and rerank with LLM.
  3. Evaluation
    Use scripts in eval/ to evaluate the final results with metrics such as Recall@K and MRR.

Results

All experimental results are located in the result/ directory and can be used for reproduction.

Requirements

This project requires Python 3.8+ and the following packages:

Install dependencies with pip:

pip install openai jsonlines

Some scripts require configuration of OpenAI API Key and base_url. See script arguments for details.

Quick Start

Example: Directory-Aware Expansion

python code/scale/scaling_candidates_with_dir.py
--data_path dataset/LINUXFLBENCH_dataset.jsonl
--save_path results/dir_scaling.jsonl
--gpt_base_url https://api.openai.com/v1
--api_key YOUR_API_KEY
--kernel_path /path/to/linux/kernel/

Evaluate the results:

python code/eval/evaluate.py --path results/dir_scaling.jsonl

For more details, usage, or questions, please open an issue or contact the authors.