Aaron Mueller (original) (raw)

Aaron Mueller

Zuckerman postdoctoral fellow working with Yonatan Belinkov and David Bau on interpretability and robustness in language models. Incoming Assistant Professor at Boston University Computer Science (Fall 2025).

Email: λ@northeastern.edu, where λ=aa.mueller

About


News

2025/04

Three papers to appear at ICLR, and three papers to appear at NAACL!

2024/12

Invited talk at Tel Aviv University

2024/11

Invited talk at the Technion

2024/07

New preprint! Counterfactuals are everywhere in mech interp, but they have key issues that will bias our results if we're not careful.

2024/07

New preprint! NNsight and NDIF are tools for democratizing access to and control over the internals of large foundation models.

2024/07

New preprint on the benefits of human-scale language modeling

2024/07

Invited talks at Saarland University and EPFL

2024/06

Invited talk at Maastricht University

2024/06

Presented a paper at NAACL

2024/04

Invited talk at UCSB

2024/03

New preprint! We propose sparse feature circuits to discover and edit mechanisms of LM behavior.

2024/03

Invited talk at Nokia Bell Labs

2024/02

Invited talks at Brown University and University of Pittsburgh

2023/11

New preprint: in-context learning yields different behaviors on ID vs. OOD examples

2023/07

Our paper received an outstanding paper award

2023/07

4 papers at ACL. See you in Toronto!