Automate evaluations with the Power Platform API - Microsoft Copilot Studio (original) (raw)

Copilot Studio provides makers with tools to continuously evaluate agent performance by running automated tests against predefined test sets using the Power Platform REST API. By using the REST API, you can programmatically trigger agent evaluations as part of your development workflows, such as during agent updates, release validation, or regression testing.

Automating evaluations helps you:

Validate agent quality after making changes.
Run recurring performance checks against production or staging agents.
Integrate agent testing into CI/CD pipelines.
Detect regressions in agent behavior early in the development lifecycle.

Prerequisites

You have the Bot ID and Environment ID for the target agent.
A test set created in Copilot Studio for your target agent.
A user access token issued by Microsoft Entra ID (OAuth 2.0). To get the token, see Authentication.
- Acquire the access token by using the client ID of an app registration that has the appropriate scope granted under the Power Platform API.

Overview for running evaluations by using REST API

To run an evaluation by using the Power Platform API, follow these general steps:

Complete the prerequisites.
Find and retrieve the test set ID of the test set you want to use.
Run the evaluation.
Retrieve the results by using the evaluation run ID.

When the request is successful, the evaluation runs asynchronously and produces results that you can review in Copilot Studio.

API operations for automating evaluations

Copilot Studio supports REST API operations that you can use to programmatically trigger evaluations against your agent by using an existing test set.

For more information on how and when to use the Power Platform API, see:

Power Platform API and SDKs: From UX-first to API-first (Power Platform Developer Blog)
Programmability and extensibility overview
Get started with Power Platform API
Power Platform API operations for Copilot Studio agents

Get agent test sets

Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testsets?api-version=2024-10-01
Purpose: Retrieve an array of the test set IDs and other details for a specific agent.
Response: Returns a list called value of test sets with the following information:
- auditInfo: Timestamps and user IDs for creating and modifying each test set
- displayName: The name of the test set.
- id: The ID of the test set. Use in Start an agent evaluation to choose which test set to use.
- description: The description of the test set.
- state: The status of the test set. A usable test set has the status Active.
- totalTestCases: The number of test cases within the test set.

Learn more in List Maker Evaluation Test Sets.

Get agent test set details

Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testsets/{TestSetId}?api-version=2024-10-01
Purpose: Retrieve details for a specific test set, using the test set ID.
Response: Returns the information of one item in the Get agent test sets response array.

Learn more in List Maker Evaluation Test Sets.

Start an agent evaluation

Endpoint: POST https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testsets/{TestSetId}/run?api-version=2024-10-01
Purpose: Run an evaluation for a test set by using the test set's id. You can also include a user profile for authenticating connections during the evaluation run. Use mcsConnectionId to specify the user profile. If you don't add an mcsConnectionId to your call, the evaluation runs without authentication.
Response: Returns the following information:
- runId: The ID for the evaluation run. Use this ID to retrieve evaluation details.
- lastUpdatedAt: When the run's status was last updated.
- executionState: The run's status while the evaluation is running.
- state: Current state of the run.
- totalTestCases: Total number of test cases in the test set used for the evaluation.
- testCasesProcessed: Total test cases evaluated as of the last update.

Learn more in Run Maker Evaluation Test Set.

Get agent test runs

Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testruns?api-version=2024-10-01
Purpose: Retrieve an array of all previous runs.
Response: Each item in the array includes the same values as found in Get agent test run details.

Learn more in List Maker Evaluation Test Runs.

Get agent test run details

Endpoint: GET https://api.powerplatform.com/copilotstudio/environments/{EnvironmentId}/bots/{BotId}/api/makerevaluation/testruns/{TestRunId}?api-version=2024-10-01
Purpose: Retrieve the details of an evaluation by using the runId for your target evaluation run.
Response: Returns the following information:
- id: The ID for the evaluation run. Use this ID to retrieve evaluation details.
- environmentId: The ID for the environment of your agent.
- cdsBotId: The ID for the target agent.
- ownerId: The ID of the user who started the evaluation run.
- testSetId: The ID of the test set used for the evaluation.
- state: The progress status of the evaluation.
- startTime: When the evaluation started.
- endTime: When the evaluation finished, if it finished.
- name: Name of the evaluation.
- totalTestCases: Total test cases in the test set.
- mcsConnectionId: The connection ID for the Copilot Studio connection of the user profile used for the evaluation run. null if no user profile is connected.
- testCasesResults: The list of test cases in the evaluation run. Includes:
  * testCaseId: The ID of the test case.
  * state: The completion status of the test case.
  * metricsResults: The details and results for each test method used for the test case. Includes the following:
  * type: The test method.
  * result: The final result of the test for this test case. Includes the following:
  * data: The details of the result. Exact values depend on the test method. Learn more in the Power Platform API docs. For a general quality test, the response includes the following:
  * abstention: Whether the agent answered the query.
  * relevance: Whether the answer was relevant.
  * completeness: Whether the answer is complete.
  * status: The status of the test case.
  * errorReason: If an error occurred, the cause of the error.
  * aiResultReason: The AI explanation of the test case result.

Learn more in Get Maker Evaluation Test Run.

Use a Microsoft Studio Connector ID for evaluations

For Start an agent evaluation, you can optionally add a Microsoft Studio Connector ID to the call as a user profile for the evaluation. To find your mcsConnectionId:

Go to Power Automate.
Open the Connections page.
Select the Microsoft Copilot Studio connection.
Copy the mcsConnectionId from the URL:.../connections/shared_microsoftcopilotstudio/{mcsConnectionId}/details