Tejal Patwardhan (@tejalpatwardhan) on X (original) (raw)

Tejal Patwardhan

@tejalpatwardhan

Apr 2, 2025

Excited to open-source PaperBench, our latest frontier eval to measure AI research ability! Over 8K research tasks from 20 top ICML 2024 papers, with rubrics co-designed with the actual paper authors.

user avatar

OpenAI

@OpenAI

Apr 2, 2025

We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework. Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.

38K38K