GitHub - terarachang/MemPi: localize a memorized sequence in LLMs (NAACL 2024) (original) (raw)

Do Localization Methods Actually Localize Memorized Data in LLMs?

A Tale of Two Benchmarks (NAACL 2024)

Ting-Yun Chang, Jesse Thomason, and Robin Jia
🎞️ https://www.youtube.com/watch?v=V2i8CemZZHQ

📜 https://arxiv.org/abs/2311.09060

Content

Quick Start: $ pip install -r requirements.txt
INJ Benchmark
- Data
- Information Injection
- Run Localization Methods
DEL Benchmark
- Data
- Run Localization Methods

INJ Benchmark

Data

Data Source : ECBD dataset from Onoe et al., 2022, see README
Preprocessed Data: data/ecbd

Information Injection

$ bash script/ecbd/inject.sh MODEL

MODEL: gpt2, gpt2-xl, EleutherAI/pythia-2.8b-deduped-v0, EleutherAI/pythia-6.9b-deduped
We release our collected data at data/pile/EleutherAI

Run Localization Methods

$ bash script/ecbd/METHOD_NAME.sh MODEL

e.g., bash script/ecbd/HC.sh EleutherAI/pythia-6.9b-deduped
METHOD_NAME
- Hard Concrete: HC
- Slimming: slim
- IG (Knowledge Neruons): kn
- Zero-Out: zero
- Activations: act

DEL Benchmark

Data

Find data memorized by Pythia models from the Pile-dedup

Data Source: Please follow EleutherAI's instructions to download pretrained data in batches
Identify memorized data with our filters: $ bash script/pile/find.sh MODEL
- MODEL: EleutherAI/pythia-2.8b-deduped-v0 or EleutherAI/pythia-6.9b-deduped
We release our collected data at data/pile

Data memorized by GPT2-XL

We release our manually collected data at data/manual/memorized_data-gpt2-xl.jsonl

Pretrained sequences for perplexity

We randomly sample 2048 sequences from the Pile-dedupe to calculate perplexity
- shared by all LLMs
Tokenized data at data/pile/*/pile_random_batch.pt

Run Localization Methods

$ bash script/pile/METHOD_NAME.sh MODEL

For Pythia models
METHOD_NAME
- Hard Concrete: HC
- Slimming: slim
- IG (Knowledge Neruons): kn
- Zero-Out: zero
- Activations: act

$ bash script/manual/METHOD_NAME.sh

For GPT2-XL