MATS Research (original) (raw)

Launch your career in AI alignment & security

The MATS Program is an independent research fellowship that connects talented researchers with top mentors in the fields of AI alignment, transparency, and security. Fellows conduct research for 10 weeks at our offices in Berkeley, CA and London, UK, with the opportunity to apply for an additional 6-12 month funded extension.

Applications for the Autumn 2026 cohort are now closed.

Robert KrzyzanowskiPoseidon Research

Before MATS, I had a strong interest in alignment generally but few skillsets relevant to the frontier of research and little idea of how to get started. Directly thanks to MATS, I achieved: (1) a relatively complete understanding of the structure of the most important questions and associated communities in in the AI safety space, (2) legible and significant research outputs that gave me the confidence to continue switching into a full-time career in the space, and (3) access to a broad base of present and future collaborators with a very wide range of perspectives. On this third point, the talent exhibited at MATS is fearsome and highly motivated to solve the problems. It would not be at all surprising to me if when the dust settles and the grand project of alignment reaches eventual fruition, it becomes apparent that over a double digit percentage of the credit attribution to the key problems and solutions belongs to MATS alumni.

I am an independent AI safety researcher currently focused on mechanistic interpretability and training process transparency.

Thomas LarsenAI Futures Project

MATS helped me upskill in alignment at a >3x rate relative to the counterfactual, which was independently learning infra-bayesianism because I liked math and I didn't have an inside view on what parts of alignment was important. MATS caused me to develop a much deeper view of the alignment problem and afterwards I felt like I was able to focus on the most important parts of the problem and biggest sources of confusion within myself.

Thomas took part in the Summer 2022 Cohort with John Wentworth and the Winter 2023 Cohort with Nate Soares. During this time, he wrote a detailed overview of AI Safety approaches. He continued his SERI MATS work at MIRI, before leaving to found the Center for AI Policy, an AI safety advocacy organization. He is currently a researcher at the AI Futures Project and a guest fund manager at the LTFF.

Nina PanicksseryAnthropic

Participating in MATS was a great way to rapidly upskill in AI safety research, learn about the field, and meet other researchers/collaborators. The environment/office space was also very thoughtfully designed to enable productivity.

Nina participated in the MATS summer 2023 cohort under the mentorship of Evan Hubinger. As a result of MATS, she published the paper Steering Llama 2 via Contrastive Activation Addition which won an Outstanding Paper Award at ACL 2024. After MATS, Nina joined Anthropic as a research scientist, and has mentored a number of SPAR and MATS cohorts working on LLM alignment projects.

There's life pre-MATS and life post-MATS. It was the inflection point that set me up to become a technical AI safety researcher. I don't think there are other opportunities as good at getting early-career people integrated into AI safety. The in-person program was the most impactful and high-energy two months I've ever been a part of, and it's my number one recommendation to people considering work on AI safety.

Jesse Hoogland is the executive director of Timaeus, an AI safety research organization studying developmental interpretability and singular learning theory. He was a MATS scholar during MATS 3.0 and 3.1 in Evan Hubinger's Deceptive AI stream. During this period, he became interested in understanding how AI systems develop during training. This led to him helping to organize the SLT and Alignment conference and the DevInterp conference, which resulted in the developmental interpretability research agenda.

Marius HobbhahnApollo Research

Apollo almost certainly would not have happened without MATS. One of the core reasons why starting an organization is hard is because the founding members need to know and trust each other. It is often hard to find people with similar agendas that you also personally enjoy working with in a systematic manner. MATS implicitly created such an environment because it enabled many of us to understand what everyone else is working on, get to know them personally and see their research progress without having to commit to anything in particular.

Marius took part in MATS Winter 2022/23 Cohort under the mentorship of Evan Hubinger (Anthropic). He published multiple pieces on mechanistic interpretability on LessWrong including work on maximum data dimension and double descent. He is currently the CEO and Director of Apollo Research, a new London-based technical alignment organization. Previously, he did a Ph.D. in Machine Learning and conducted independent alignment research. Read more on his website.

Quentin Feuillade-MontixiSeldon Labs#2

MATS was a life changing experience. I met and got mentored by amazing people, and I learned so much in such a small amount of time. Looking back at me before this program, I don't think I could even recognize myself 8 month ago. Even though I have no academic background, I felt listened, empowered and supported in order to tackle the biggest challenges that I (and possibly we) have ever faced.

After MATS, I worked as a contractor for METR evaluating GPT-4 pre-release. I then co-founded PRISM Eval and created an automated red-teaming system (BehaviorElicitiationTool: https://github.com/qfeuilla/BehaviorEliciationTool) that I presented at the Paris AI Summit. I am now founding WeaveMind (https://weavemind.ai/) at Seldon Lab Batch 2.

Kay KozaronekAI Safety Connect (AISC)

Working in a team environment, particularly one as stimulating as MATS, was a transformative experience. It not only refined my research skills but also instilled a newfound entrepreneurial spirit in me. The program encouraged me to think beyond the conventional, to innovate, and to take risks. Additionally, the array of skills I acquired during my time at MATS was vast. I delved deep into research engineering, honed my science communication abilities, and even tapped into the art of fundraising. These skills, I believe, are indispensable and have equipped me to navigate the ever-evolving world of research with confidence. In conclusion, I wholeheartedly endorse the MATS program. To anyone considering embarking on this journey, you are not only signing up for an unparalleled research experience but also a lifetime of growth, learning, and camaraderie.

I'm working on AI Safety Connect, a new organization convening diplomatic and AI Safety stakeholders at the highest level - think UN, India Impact Summit etc. We are also seeding a few other projects, like engaging the UAE in AI Safety and helping prevent critical coordination failures among frontier labs.

Johannes TreutleinAnthropic

MATS helped me get deeper into AI safety research by motivating me to get up to speed with current research and giving me access to mentorship from an expert in AI safety, as well as a smart and talented cohort and a large network of researchers. It also provided infrastructure such as office space in Berkeley and a generous stipend. SERI MATS worked as a matchmaker between Evan Hubinger and me and thus helped me get involved in his projects, which would have been harder to do otherwise. I feel like I have developed faster as a researcher since doing MATS.

Johannes completed the MATS Summer 2022 Cohort under the mentorship of Evan Hubinger (then a Research Fellow at MIRI). As a result of MATS, Johannes co-authored the paper Conditioning Predictive Models: Risks and Strategies with Evan as a lead author. He also published a follow-up paper on Incentivizing honest performative predictions with proper scoring rules at the UAI 2023 conference. After MATS, Johannes started a PhD in Computer Science at CHAI. Since 2024, he Johannes has been working at Anthropic on alignment stress-testing.

Cody RushingRedwood Research

I endorse MATS strongly! MATS is my top recommendation for people looking to get into technical AI Safety research. The mentorship and community I received through MATS enabled me to quickly grow as a researcher and gave me the space to pursue useful research directions.

Cody Rushing is an Undergraduate CS Major at UT Austin. He is currently working with Buck Shlegeris and Redwood Research on AI Control. He is continuing this research into the fall.

https://starship006.github.io/

MATS was an excellent environment to get productive work done and a fantastic resource to improve my future impact in AI alignment. I made connections, learned a great deal about my mentor's subfield and alignment in general, and was fired up to keep working when I got back to Australia. Since MATS I've been funded for a project with a collaborator I met at MATS, and gotten significantly further in the hiring process for orgs than before.

Previous UK AISI employee experienced in frontier LLM evaluation, now looking to contribute to technical AI safety and reducing extinction risks from misaligned AGI systems.

Ethan spent a lot of time discussing our research with us and gave great advice on direction. He unblocked us in various ways, such as getting access to more models or to lots of compute budget. He connected us with lots of great people, some of whom became collaborators. And he was a very inspiring mentor to work with.

Dan Valentine is a Member of Technical Staff at Anthropic, an AI safety and research company. His work is primarily focused on AI safety and alignment research, including scalable oversight methods and understanding how AI models interact with data and prompts.

MATS alumni are shaping the future of AI

Since late 2021, over 527 researchers have trained through MATS, producing 200+ research papers, joining leading AI labs, and founding new organizations driving progress in AI alignment, transparency, and security.

In the past 4.5 years, we have helped produce more than 200 research publications with over 12,000 collective citations; our organizational h-index is 50.

MATS fellows have helped develop new research agendas, including sparse auto-encoders for AI interpretability, activation/representation engineering, emergent misalignment, inoculation prompting, developmental interpretability, computational mechanics, glitch token analysis, evaluating situational awareness, gradient routing, externalized reasoning oversight, conditioning predictive models, formalizing natural abstractions, and more!

10% of alumni have co-founded AI safety organizations or research teams during or after MATS.

MATS alumni-founded organizations include Aether, AI Safety Argentina, Algoverse AI Safety Fellowship, Apollo Research, ARENA, Athena, Atla, Cadenza Labs, Catalyze Impact, Center for AI Policy, Contramont Research, Coordinal Research, Decode Research, Freestyle Research, Fulcrum, General Analysis, Groundless, Leap Labs, LISA, Luthien Research, MIRI Technical Governance Team, Poseidon Research, Principled Agents, PRISM Eval, Simplex, SL5 Taskforce, StakeOut AI, Timaeus, Theorem Labs, Watertight AI, WeaveMind, and Workshop Labs.

80% of alumni **are now working in AI alignment, transparency, and security.**‍

MATS alumni have been hired by leading organizations like Anthropic, Google DeepMind, OpenAI, Meta AI, UK AISI, Redwood Research, METR, RAND CAST, Coefficient Giving, ARC, FAR.AI, Apollo Research, Truthful AI, Goodfire, LawZero, MIRI, CAIF, Center on Long-Term Risk, Beneficial AI Foundation, SaferAI, Haize Labs, EleutherAI, Harmony Intelligence, Conjecture, and joined academic research groups like UC Berkeley CHAI, NYU ARG, NU Bau Lab, Mila, and MIT Tegmark Group.

MATS is designed to empower researchers so they can focus on impact

MATS provides mentorship, research funding, housing, and community so researchers can devote their energy to solving the world’s most important problem.

Mentorship

Fellows receive guidance from top researchers in AI alignment, governance, and security.

Research support

Fellows work with a dedicated research manager who helps scope projects, maintain progress, and remove blockers.

Educational events

Fellows participate in seminars, workshops, and guest lectures led by experts across the alignment community.

Stipend

Fellows receive a $1250 stipend per week to cover expenses.

Compute budget

Fellows are provided with $2k per week of compute resources to support experiments and evaluations.

Workspace

Fellows have access to office space in Berkeley and London, and collaborate daily with fellow researchers.

Meals & housing

Fellows receive catered lunches and dinners and are provided with housing for the full duration of the main program. During the extension phase, fellows receive a housing stipend.

Community

Fellows gain connections and networking opportunities across the broader AI alignment ecosystem.

Extension pathway

Fellows may apply to join the London-based extension program for an additional 6–12 months of research; over 80% of fellows are admitted.

Research produced by MATS fellows

The body of research produced by MATS fellows spans the full spectrum of advancing AI safety, resilience, and understanding. Scholars investigate the inner workings of modern AI systems through mechanistic interpretability, sparse feature analysis, studies of latent representations and other techniques.

Sparse Autoencoders Find Highly Interpretable Features in Language Models

One of the roadblocks to a better understanding of neural networks' internals is polysemanticity, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons. Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. These autoencoders learn sets of sparsely activating features that are more interpretable and monosemantic than directions identified by alternative approaches, where interpretability is measured by automated methods. Moreover, we show that with our learned set of features, we can pinpoint the features that are causally responsible for counterfactual behaviour on the indirect object identification task \citep{wang2022interpretability} to a finer degree than previous decompositions. This work indicates that it is possible to resolve superposition in language models using a scalable, unsupervised method. Our method may serve as a foundation for future mechanistic interpretability work, which we hope will enable greater model transparency and steerability.

Authors:

Hoagy Cunningham

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

AI agents find $4.6M in blockchain smart contract exploits

AI models are increasingly good at cyber tasks, as we've written about before. But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents' ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench)—a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoff (March 2025), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth 4.6million,establishingaconcretelowerboundfortheeconomicharmthesecapabilitiescouldenable.Goingbeyondretrospectiveanalysis,weevaluatedbothSonnet4.5andGPT−5insimulationagainst2,849recentlydeployedcontractswithoutanyknownvulnerabilities.Bothagentsuncoveredtwonovelzero−dayvulnerabilitiesandproducedexploitsworth4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth 4.6million,establishingaconcretelowerboundfortheeconomicharmthesecapabilitiescouldenable.Goingbeyondretrospectiveanalysis,weevaluatedbothSonnet4.5andGPT5insimulationagainst2,849recentlydeployedcontractswithoutanyknownvulnerabilities.Bothagentsuncoveredtwonovelzerodayvulnerabilitiesandproducedexploitsworth3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.

Authors:

Winnie Xiao

Winnie Xiao, Cole Killian, Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng

MATS fellows' core work focuses on these tracks

Click on a track to learn more about the application process, applicant profile, and focus areas.

Founding and Field-Building

AI safety needs to scale fast, and the bottleneck is increasingly organizational. For founders, field-builders, and high-agency generalists launching new AI safety organizations, programs, and projects mentored by founders, sitting CEOs, and program directors across the ecosystem.

Empirical

Hands-on research using machine learning experiments to understand and improve model safety including AI control, interpretability, scalable oversight, evaluations, red-teaming, and robustness.

Policy and Governance

Research on how advanced AI is and should be governed, spanning governance mechanisms, regulatory and institutional analysis, and the technical systems that make governance enforceable.

Biosecurity

Research on catastrophic biological risk in a world being reshaped by advanced AI. Spans pathogen detection, medical countermeasures, synthesis screening, physical biodefense, threat modeling, and red-teaming biological AI for dangerous capabilities.

Strategy and Forecasting

Research on how AI development is likely to unfold and what that means for long-term safety. Includes timelines, takeoff dynamics, risk modeling, and strategic analysis of AI's trajectory.

Theory

Foundational research on the mathematical and philosophical principles underlying agency, alignment, and safe reasoning in advanced AI systems.

Systems Security

Research on software and hardware security for the infrastructure on which advanced AI runs, including side-channel analysis, cluster security, model-weight protection, and physical-layer verification.

Our mission

MATS aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence (AI). We believe that ambitious researchers from a variety of backgrounds have the potential to meaningfully contribute to the field of alignment research. We aim to provide the training, logistics, and community necessary to aid this transition. We also connect our fellows with financial support to ensure their stability and security.

MATS Research is an independent 501(c)(3) public charity (EIN: 99-0648563).

Join the MATS team

The MATS Program aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence. We are actively hiring in a variety of roles to advance our mission.

Frequently asked questions

What is the MATS Program?

The MATS Program is a 10-week research fellowship designed to train and support emerging researchers working on AI alignment, transparency and security. Fellows collaborate with world-class mentors, receive dedicated research management support, and join a vibrant community in Berkeley focused on advancing safe and reliable AI. The program provides the structure, resources, and mentorship needed to produce impactful research and launch long-term careers in AI safety.

Who are the MATS Mentors?

MATS mentors are leading researchers from a broad range of AI safety, alignment, governance, field-building and security domains. They include academics, industry researchers, and independent experts who guide scholars through research projects, provide feedback, and help shape each scholar’s growth as a researcher. The mentors represent expertise in areas such as:

View past and current mentors

What are the key dates of the MATS Program?

Key dates

Application:

The main program will then run from September 28th to December 4th, with the extension phase for accepted fellows beginning in December.

Who is eligible to apply?

MATS accepts applicants from diverse academic and professional backgrounds - from machine learning, mathematics, and computer science to policy, economics, physics, cognitive science, biology, and public health, as well as founders, operators, and field-builders without traditional research backgrounds. The primary requirements are strong motivation to contribute to AI safety and evidence of technical aptitude, research potential, or relevant operational experience. Prior AI safety experience is helpful but not required.

How does the application and mentor selection process work?

Applicants submit a general application, applying to various tracks (Empirical, Theory, Strategy & Forecasting, Policy & Governance, Systems Security, Biosecurity, Founding & Field-Building.

In stage 2, applicants apply to streams within those tracks as well as completing track specific evaluations.

After a centralized review period, applicants who are advanced will then undergo additional evaluations depending on the preferences of the streams they've applied to before doing final interviews and receiving offers.

For more information on how to get into MATS, please look at this page.