Aaron Mueller (original) (raw)
Zuckerman postdoctoral fellow working with Yonatan Belinkov and David Bau on interpretability and robustness in language models. Incoming Assistant Professor at Boston University Computer Science (Fall 2025).
Email: λ@northeastern.edu, where λ=aa.mueller
About
News
2025/04
Three papers to appear at ICLR, and three papers to appear at NAACL!
2024/12
Invited talk at Tel Aviv University
2024/11
Invited talk at the Technion
2024/07
New preprint! Counterfactuals are everywhere in mech interp, but they have key issues that will bias our results if we're not careful.
2024/07
New preprint! NNsight and NDIF are tools for democratizing access to and control over the internals of large foundation models.
2024/07
New preprint on the benefits of human-scale language modeling
2024/07
Invited talks at Saarland University and EPFL
2024/06
Invited talk at Maastricht University
2024/06
Presented a paper at NAACL
2024/04
Invited talk at UCSB
2024/03
New preprint! We propose sparse feature circuits to discover and edit mechanisms of LM behavior.
2024/03
Invited talk at Nokia Bell Labs
2024/02
Invited talks at Brown University and University of Pittsburgh
2023/11
New preprint: in-context learning yields different behaviors on ID vs. OOD examples
2023/07
Our paper received an outstanding paper award
2023/07
4 papers at ACL. See you in Toronto!