GitHub - terarachang/LLMDecomp: LLM decomposition; few-shot learning (EMNLP 2024) (original) (raw)
Navigation Menu
- GitHub Copilot Write better code with AI
- GitHub Models New Manage and compare prompts
- GitHub Advanced Security Find and fix vulnerabilities
- Actions Automate any workflow
- Codespaces Instant dev environments
- Issues Plan and track work
- Code Review Manage code changes
- Discussions Collaborate outside of code
- Code Search Find more, search less
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Appearance settings
Repository files navigation
When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models (EMNLP 2024)
Ting-Yun Chang, Jesse Thomason, and Robin Jia
Paper: https://arxiv.org/abs/2406.13131 Blog: https://terarachang.github.io/projects/llm-decomp.html
Methods
Quick Start
export HF_TOKEN="YOUR TOKEN"
pip install -r requirements.txt
Component Reweighting
$ bash scripts/comp_rw.sh
- Implementation of model decomposition: decompose.py
- Implementation of reweighting: train_components.py
Standard ICL
$ bash scripts/standard.sh
Calib+
$ bash scripts/calibration.sh
- Implementation of trainable calibration: train_calib.py
Adding New Models
- Our repo supports LLMs in the Llama and Mistral family
- To support new models, please add hooks to the model and follow the naming convention of my_modeling_llama.py
- If the new model also uses RMSNorm, the decompose.py file is directly applicable. Otherwise, please take care of layernorms, which may greatly influence model performance!
- *We do not fully adopt TransformerLens to avoid numerical issues in Llama-3 and reduce computation overhead