GitHub - MqLeet/LED-Merging: [ACL2025 main] Official implementation of "LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint" (original) (raw)

[ACL 2025 Main] LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

🔥 News

📝 TODO

🛠️ Getting Started

1. Setup

git clone https://github.com/MqLeet/LED-Merging.git cd LED-Merging conda create -n led python==3.12 conda activate led pip install -r requirements.txt

2. Model Preparation

The used pretrained models can be found here:

Pretrained Backbones SFT LLMs
Meta-Llama-3-8B Meta-Llama-3-8B-Instruct
MAmmoTH2-8B-Plus
Replete-Coder-Llama3-8B
Mistral-7B Mistral-7B-Instruct
MetaMath-Mistral-7B
Llama-2-13B WizardLM-13B
WizardMath-13B
llama-2-13b-code-alpaca

3. Dataset / Benchmark Preparation

🚀 Run and evaluation

We use Llama3-8B as an example to demonstrate the workflow of LED-Merging:

🔎 Locate

To get the importance scores in Equation 1 in paper, run the commands below:

cd locate/ bash scritps/locate_inst.sh

🗳️ Elect

Then, to select important neurons based on the importance scores, run the commands below:

mask_pattern=11 model_type="llama3" rates=(0.1 0.4 0.5)

python mask_generate.py rates[0]{rates[0]} rates[0]{rates[1]} rates[2]{rates[2]} rates[2]mask_pattern $model_type

🪡 Disjoint and Merging

Finally, Disjoint conflict neruons and implement model merging:

model_type="llama3" base_model="llama3-base" orders=(safety math code) fuse_types=(o o o o) lambdas=(1.0 1.0 1.0) fuse_o=11 alphas=(0.9)

python merge_llms.py
--models_to_merge llama3-instruct llama3-math llama3-code
--pretrained_model_name $base_model
--merging_method_name top_merging
--scaling_coefficient ${alphas[0]}
--mask_apply_method task_arithmetic
--fuse_rates maskrates[0]{mask_rates[0]} maskrates[0]{mask_rates[1]} ${mask_rates[2]}
--orders orders[0]{orders[0]} orders[0]{orders[1]} ${orders[2]}
--lambdas lambdas[0]{lambdas[0]} lambdas[0]{lambdas[1]} ${lambdas[2]}
--fuse_types fusetypes[0]{fuse_types[0]} fusetypes[0]{fuse_types[1]} ${fuse_types[2]}
--fuse_patterns fuseofuse_o fuseofuse_o $fuse_o
--model_ft_name $model_type
--model_base_name $base_model

🧪 Inference

Test on SafetyGuards

First, add items into HarmBench/configs/model_configs/models.yaml, like this:

llama3_8b_base_inst_math_code_task_0.9: model: model_name_or_path: save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/ dtype: float16 chat_template: llama-3 num_gpus: 1 model_type: open_source

  1. HarmBench Generate responses

HarmBench

cd HarmBench

generate harmbench

export MKL_SERVICE_FORCE_INTEL=1 chat_template="llama3" dataset="harmbench" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" test_cases_path="./data/behavior_datasets/harmbench_behaviors_text_all_results/test_cases.json" trust_remote_code="True" max_new_tokens=512 model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/" model_name="model_name"

python -u generate_completions.py
--model_name $model_name
--behaviors_path $behaviors_path
--test_cases_path $test_cases_path
--save_path $save_path
--chat_template $chat_template
--max_new_tokens maxnewtokens−−datasetmax_new_tokens --dataset maxnewtokensdatasetdataset --model_name_or_path $model_name_or_path
--trust_remote_code --generate_with_vllm

cd ./completions python merge.py harmbench/$model_name.json cd ../

Evaluation:

dataset="harmbench" cls_path="/path/to/HarmBench-Llama-2-13b-cls" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" eval_type="harmbench" completions_path= "/path/to/generated/reponses"

python -u evaluate_completions.py
--cls_path $cls_path
--behaviors_path $behaviors_path
--completions_path $completions_path
--save_path $save_path
--eval_type $eval_type

  1. SORRY-Bench

Generate Reponses:

dataset="sorrybench" test_cases_path="./data/sorrybench/test_cases.json" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" chat_template="llama3" model_name="model_name" model_name_or_path="save_merge_llms/llama3-base/llama3-instruct_llama3-math/task_arithmetic_scaling_coefficient_0.9/"

python -u generate_completions.py
--model_name $model_name
--behaviors_path $behaviors_path
--test_cases_path $test_cases_path
--save_path $save_path
--chat_template $chat_template
--max_new_tokens maxnewtokens−−datasetmax_new_tokens --dataset maxnewtokensdatasetdataset --model_name_or_path $model_name_or_path
--trust_remote_code --generate_with_vllm

Evaluation:

eval_type="sorrybench" cls_path="/path/to/ft-mistral-7b-instruct-v0.2-sorry-bench-202406" behaviors_path="./data/behavior_datasets/harmbench_behaviors_text_all.csv" completions_path= "/path/to/generated/reponses"

python -u evaluate_completions.py
--cls_path $cls_path
--behaviors_path $behaviors_path
--completions_path $completions_path
--save_path $save_path
--eval_type $eval_type
--include_advbench_metric

Test on Math

  1. GSM8K

python inference_llms.py --dataset_name gsm8k --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus python test_gsm8k_rule.py --files /path/to/generated/responses

  1. MATH

python inference_llms.py --dataset_name MATH --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus python test_math_rule.py --files /path/to/generated/responses

Test on Code

  1. mbpp

python inference_llms.py --dataset_name mbpp --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks mbpp --allow_code_execution --load_generations_path /path/to/generated/responses

  1. humanevalpack

python inference_llms.py --dataset_name human_eval_pack --finetuned_model_path /path/to/your/merged/model --tensor_parallel_size $num_gpus accelerate launch --num_processes $num_gpus ./bigcode-evaluation-harness/main.py --tasks humanevalfixdocs-python --allow_code_execution --load_generations_path /path/to/generated/responses

✒️ Citation

If you find LED-Merging useful for your research and applications, please kindly cite our paper using this BibTeX:

@article{ma2025led, title={LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint}, author={Ma, Qianli and Liu, Dongrui and Chen, Qian and Zhang, Linfeng and Shao, Jing}, journal={arXiv preprint arXiv:2502.16770}, year={2025} }

❤️ Acknowledgement

Our code is built upon MergeLLM, MergeLM and alignment-attribution-code. The Benchmark we use is bigcode-evaluation-harness and HarmBench.

We also refer to the Pretrained backbones and SFT LLMs used above. Thanks to all the contributors for their great work!