Application Form - Anthropic initiative for developing third-party model evaluations (original) (raw)

Thank you for your interest in Anthropic's initiative for developing model evaluations. Please complete the following application form with details about your team and proposed project, focusing on its potential impact on AI safety and responsible development.

***** Please note that the deadline for Round 1 of submissions was July 20th. ******

We are currently processing those applications. We will not be processing applications submitted after July 20th until we open for Round 2 (date TBD).

You are welcome to submit applications in the meantime and they will be assessed in Round 2.

About our Process

Application Processing: Our team will review submissions after July 20th, following up with selected proposals to discuss next steps. Due to the high volume of submissions we anticipate, we regret that we may not be able to respond to every proposal individually. However, we deeply appreciate the time and effort put into each submission and will carefully review all entries. We'll prioritize contacting those proposals that most closely align with our current evaluation needs and goals.

Funding options: We offer a range of funding options tailored to the needs and stage of each project. We've streamlined our collaboration process with a simple, efficient contracting framework, and where possible, our approach is structured in a manner that enables evaluation developers to disseminate and/or commercialize their work across the broader AI community. This approach is designed to enable you to distribute your evaluations to governments, researchers, and labs focused on AI safety.

Collaboration and feedback: Our experience has shown that refining an evaluation typically requires several iterations. For new evaluation development, we recommend a phased approach:

Begin with a rapid pilot proof-of-concept designed and implemented over a few weeks.
If successful, we'll either scale up the effort or sign a Letter of Intent to purchase the final product.
For particularly valuable evaluations, we may further expand the scope or develop similar assessments.

To support this iteration process, we've appointed a full-time coordinator for the program. You'll also have the opportunity to interact directly with our domain experts from the Frontier Red Team, Finetuning, Trust & Safety, and other relevant teams. Our teams can provide guidance to help shape your evaluations for maximum impact.

Frameworks: We strongly encourage you to adhere to either the METR or Inspect task standards for your evaluations. For autonomy tasks we prefer METR; for other evaluations we prefer Inspect. Using one of these standards makes it easier for us (and others) to ingest and run your evaluations.