GitHub - DeqingFu/transformers-icl-second-order: Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models. (original) (raw)
Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
This is an official repository for our paper, Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression.
@inproceedings{ fu2024transformers, title={Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression}, author={Deqing Fu and Tian-qi Chen and Robin Jia and Vatsal Sharan}, booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year={2024}, url={https://openreview.net/forum?id=L8h6cozcbn} }
Codes are mostly modified from this prior work.
Getting started
You can start by cloning our repository and following the steps below.
- Install the dependencies for our code using Conda. You may need to adjust the environment YAML file depending on your setup.
conda env create -f environment.yml
conda activate transformers_icl_opt
- Download model checkpoints and extract them in the current directory.
wget https://github.com/dtsip/in-context-learning/releases/download/initial/models.zip
unzip models.zip
- Run probing for each Transformers layer
- Compute Transformer's similarities to both Iterative Newton's Method and Gradient Descent
python eval_similarity.py
This will plot Fig. 1(a) and Fig. 3 in the paper, under a new folder eval
.