GitHub - CASE-Lab-UMD/LLM-Drop: The official implementation of the paper "Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping (TMLR)". (original) (raw)

OpenReview Hugging Face TMLR 2026 Python 3.10+

Shwai He*, Guoheng Sun*, Zheyu Shen, Ang Li

🌐 Project Page β€’πŸ“° News β€’βš™οΈ Installation β€’πŸ“¦ Layout β€’πŸ§° Models β€’πŸ“Š Benchmark β€’πŸ“„ Citation

This is the official implementation for the paper Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping (TMLR). Early version: What Matters in Transformers? Not All Attention Is Needed.

πŸ“– Introduction

This project studies architectural redundancy in Transformer-based LLMs and provides practical pipelines for:

The dropping pipeline is built on LLaMA-Factory. Quantization support is built on AutoAWQ and AutoGPTQ.

Layer-Drop.svg

πŸ“° News

βš™οΈ Installation

conda create -n llm-drop python=3.10 -y conda activate llm-drop

git clone https://github.com/CASE-Lab-UMD/LLM-Drop.git cd LLM-Drop

Core dropping pipeline

pip install -e .

Quantization dependencies (optional)

cd src/llmtuner/compression/quantization/AutoAWQ pip install -e .

cd AutoAWQ_kernels pip install -e .

cd ../../AutoGPTQ pip install -vvv --no-build-isolation -e .

cd ../../../../../..

πŸ“¦ Repository Layout

🧰 Prepare Models

  1. Download a base model from Hugging Face (for example mistralai/Mistral-7B-v0.1).
  2. Add auto_map in the model config.json so Transformers can load custom dropped-model classes.
  3. Set drop lists in config.json:

"drop_mlp_list": [], "drop_attn_list": [25, 26, 24, 22]

"drop_mlp_list": [26, 27, 25, 24], "drop_attn_list": []

"drop_mlp_list": [26, 25, 24, 27], "drop_attn_list": [26, 25, 24, 27]

Example auto_map for Mistral:

"auto_map": { "AutoConfig": "configuration_dropped_mistral.MistralConfig", "AutoModelForCausalLM": "modeling_dropped_mistral.MistralForCausalLM" }

See model files under src/llmtuner/compression/prune/models.

πŸš€ Run Dropping

Block Drop

bash scripts/dropping/block_drop.sh

Layer Drop

bash scripts/dropping/layer_drop.sh

Joint Layer Drop

bash scripts/dropping/layer_drop_joint.sh

These scripts estimate module importance, select layers/blocks to drop, and generate updated model configs/checkpoints.

πŸ“Š Benchmark

πŸ§ͺ 1) Task Performance

bash scripts/benchmark/benchmark_lm_eval.sh

Notes:

⚑ 2) Inference Speed

bash scripts/benchmark/benchmark_speed.sh

Before running, edit placeholders in scripts/benchmark/benchmark_speed.sh:

🧊 3) Quantization

bash scripts/quantization/awq.sh bash scripts/quantization/gptq.sh

Before running, edit placeholders in those scripts (model_path, quant_path) and ensure CUDA-compatible package versions.

πŸ“„ Citation

@article{ he2026uncovering, title={Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping}, author={Shwai He and Guoheng Sun and Zheyu Shen and Ang Li}, journal={Transactions on Machine Learning Research}, issn={2835-8856}, year={2026}, url={https://openreview.net/forum?id=1I7PCbOPfe}, note={} }

πŸ“¬ Contact