GitHub - MqLeet/LED-Merging: [ACL2025 main] Official implementation of "LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint" (original) (raw)
[ACL 2025 Main] LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
🔥 News
- [2025/05/16] LED-Merging has been accepted to ACL 2025 main conference 🎉🎉🎉
- [2025/06/13] Code has been released 🔥🔥🔥
📝 TODO
- Merge code
- Inference code
- Project page
🛠️ Getting Started
1. Setup
git clone https://github.com/MqLeet/LED-Merging.git cd LED-Merging conda create -n led python==3.12 conda activate led pip install -r requirements.txt
2. Model Preparation
The used pretrained models can be found here:
3. Dataset / Benchmark Preparation
- Code generation Benchmark: bigcode-evaluation-harness
- SafetyGuards Benchmark: HarmBench
🚀 Run and evaluation
We use Llama3-8B as an example to demonstrate the workflow of LED-Merging:
🔎 Locate
To get the importance scores in Equation 1 in paper, run the commands below:
cd locate/ bash scritps/locate_inst.sh
🗳️ Elect
Then, to select important neurons based on the importance scores, run the commands below:
mask_pattern=11 model_type="llama3" rates=(0.1 0.4 0.5)
python mask_generate.py rates[0]{rates[0]} rates[0]{rates[1]} rates[2]{rates[2]} rates[2]mask_pattern $model_type
🪡 Disjoint and Merging
Finally, Disjoint conflict neruons and implement model merging:
model_type="llama3" base_model="llama3-base" orders=(safety math code) fuse_types=(o o o o) lambdas=(1.0 1.0 1.0) fuse_o=11 alphas=(0.9)
python merge_llms.py
--models_to_merge llama3-instruct llama3-math llama3-code
--pretrained_model_name $base_model
--merging_method_name top_merging
--scaling_coefficient ${alphas[0]}
--mask_apply_method task_arithmetic
--fuse_rates maskrates[0]{mask_rates[0]} maskrates[0]{mask_rates[1]} ${mask_rates[2]}
--orders orders[0]{orders[0]} orders[0]{orders[1]} ${orders[2]}
--lambdas lambdas[0]{lambdas[0]} lambdas[0]{lambdas[1]} ${lambdas[2]}
--fuse_types fusetypes[0]{fuse_types[0]} fusetypes[0]{fuse_types[1]} ${fuse_types[2]}
--fuse_patterns fuseofuse_o fuseofuse_o $fuse_o
--model_ft_name $model_type
--model_base_name $base_model
🧪 Inference
Test on SafetyGuards
First, add items into HarmBench/configs/model_configs/models.yaml, like this:
llama3_8b_base_inst_math_code_task_0.9: model: model_name_or_path: save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/ dtype: float16 chat_template: llama-3 num_gpus: 1 model_type: open_source
- HarmBench Generate responses
HarmBench
cd HarmBench
generate harmbench
export MKL_SERVICE_FORCE_INTEL=1 chat_template="llama3" dataset="harmbench" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" test_cases_path="./data/behavior_datasets/harmbench_behaviors_text_all_results/test_cases.json" trust_remote_code="True" max_new_tokens=512 model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/" model_name="model_name"
python -u generate_completions.py
--model_name $model_name
--behaviors_path $behaviors_path
--test_cases_path $test_cases_path
--save_path $save_path
--chat_template $chat_template
--max_new_tokens maxnewtokens−−datasetmax_new_tokens --dataset maxnewtokens−−datasetdataset --model_name_or_path $model_name_or_path
--trust_remote_code --generate_with_vllm
cd ./completions python merge.py harmbench/$model_name.json cd ../
Evaluation:
dataset="harmbench" cls_path="/path/to/HarmBench-Llama-2-13b-cls" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" eval_type="harmbench" completions_path= "/path/to/generated/reponses"
python -u evaluate_completions.py
--cls_path $cls_path
--behaviors_path $behaviors_path
--completions_path $completions_path
--save_path $save_path
--eval_type $eval_type
- SORRY-Bench
Generate Reponses:
dataset="sorrybench" test_cases_path="./data/sorrybench/test_cases.json" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" chat_template="llama3" model_name="model_name" model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/"
python -u generate_completions.py
--model_name $model_name
--behaviors_path $behaviors_path
--test_cases_path $test_cases_path
--save_path $save_path
--chat_template $chat_template
--max_new_tokens maxnewtokens−−datasetmax_new_tokens --dataset maxnewtokens−−datasetdataset --model_name_or_path $model_name_or_path
--trust_remote_code --generate_with_vllm
Evaluation:
eval_type="sorrybench" cls_path="/path/to/ft-mistral-7b-instruct-v0.2-sorry-bench-202406" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" completions_path= "/path/to/generated/reponses"
python -u evaluate_completions.py
--cls_path $cls_path
--behaviors_path $behaviors_path
--completions_path $completions_path
--save_path $save_path
--eval_type $eval_type
--include_advbench_metric
Test on Math
- GSM8K
python inference_llms.py --dataset_name gsm8k --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus python test_gsm8k_rule.py --files /path/to/generated/responses
- MATH
python inference_llms.py --dataset_name MATH --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus python test_math_rule.py --files /path/to/generated/responses
Test on Code
- mbpp
python inference_llms.py --dataset_name mbpp --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks mbpp --allow_code_execution --load_generations_path /path/to/generated/responses
- humanevalpack
python inference_llms.py --dataset_name human_eval_pack --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks humanevalfixdocs-python --allow_code_execution --load_generations_path /path/to/generated/responses
✒️ Citation
If you find LED-Merging useful for your research and applications, please kindly cite our paper using this BibTeX:
@article{ma2025led, title={LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint}, author={Ma, Qianli and Liu, Dongrui and Chen, Qian and Zhang, Linfeng and Shao, Jing}, journal={arXiv preprint arXiv:2502.16770}, year={2025} }
❤️ Acknowledgement
Our code is built upon MergeLLM, MergeLM and alignment-attribution-code. The Benchmark we use is bigcode-evaluation-harness and HarmBench.
We also refer to the Pretrained backbones and SFT LLMs used above. Thanks to all the contributors for their great work!