Copilot+ PCs developer guide (original) (raw)

Copilot+ PCs are a new class of Windows 11 hardware powered by a high-performance Neural Processing Unit (NPU) — a specialized computer chip for AI-intensive processes like real-time translations and image generation—that can perform more than 40 trillion operations per second (TOPS). Copilot+ PCs provide all–day battery life and access to the most advanced AI features and models.

Learn more:

The following Copilot+ PC Developer Guidance covers:

Prerequisites

This guidance is specific to Copilot+ PCs.

Many of the new Windows AI features require an NPU with the ability to run at 40+ TOPS, including but not limited to:

Surface Copilot+ PCs for Business:

What is the Arm-based Snapdragon Elite X chip?

The new Snapdragon X Elite Arm-based chip built by Qualcomm emphasizes AI integration through its Neural Processing Unit (NPU). This NPU is able to process large amounts of data in parallel, performing trillions of operations per second, using energy on AI tasks more efficiently than a CPU or GPU resulting in longer device battery life. The NPU works in alignment with the CPU and GPU. Windows 11 assigns processing tasks to the most appropriate place in order to deliver fast and efficient performance. The NPU enables on-device AI experiences including Windows Copilot+ PC features.

Unique AI features supported by Copilot+ PCs with an NPU

Copilot+ PCs offer unique AI experiences that ship with modern versions of Windows 11. These AI features, designed to run on the device NPU, ship in the latest releases of Windows and will be available via APIs in Microsoft Foundry on Windows. Learn more about Microsoft Foundry on Windows APIs that are supported by models optimized to run (inference) on the NPU. These APIs will ship in an upcoming release of the Windows App SDK.

How to access the NPU on a Copilot+ PC

The Neural Processing Unit (NPU) is a new hardware resource. Like other hardware resources on a PC, the NPU needs software to be specifically programmed to take advantage of the benefits it offers. NPUs are designed specifically to execute the deep learning math operations that make up AI models.

The Windows 11 Copilot+ AI features mentioned above have been specifically designed to take advantage of the NPU. Users will get improved battery life and faster inference execution time for AI models that target the NPU. Windows 11 support for NPUs will include Arm-based Qualcomm devices, as well as Intel and AMD devices (coming soon).

For devices with NPUs, the Task Manager can now be used to view NPU resource usage.

Screenshot of Windows Task Manager displaying NPU performance alongside CPU, GPU, Memory, Ethernet, and Disk

The recommended way to inference (run AI tasks) on the device NPU is to use Windows ML.

How to programmatically access the NPU on a Copilot+ PC for AI acceleration

The recommended way to programmatically access the NPU (Neural Processing Unit) and GPU for AI acceleration has shifted from DirectML to Windows ML (WinML). This transition reflects a broader effort to simplify and optimize the developer experience for AI workloads on Windows devices. You can find updated guidance here: Learn how Windows Machine Learning (ML) helps your Windows apps run AI models locally..

When you deploy an AI model using Windows ML on a Copilot+ PC:

This means developers can focus on building AI experiences without worrying about low-level hardware integration

Supported model formats

AI models are often trained and available in larger data formats, such as FP32. Many NPU devices, however, only support integer math in lower bit format, such as INT8, for increased performance and power efficiency. Therefore, AI models need to be converted (or "quantized") to run on the NPU. There are many models available that have already been converted into a ready-to-use format. You can also bring your own model (BYOM) to convert or optimize.

For those who want to bring your own model, we recommend using the hardware-aware model optimization tool, Olive. Olive can help with model compression, optimization, and compilation to work with ONNX Runtime as an NPU performance optimization solution. Learn more: AI made easier: How the ONNX Runtime and Olive toolchain will help you Q&A | Build 2023.

How to measure performance of AI models running locally on the device NPU

To measure the performance of AI feature integration in your app and the associated AI model runtimes:

Screenshot providing a general impression of the Windows Performance Analyzer tool

To perform these measurements, we recommend the following diagnostic and tracing tools:

Note

WPR UI (the user interface available to support the command-line based WPR included in Windows), WPA, and GPUView are all part of Windows Performance Toolkit (WPT), version May 2024+. To use the WPT, you will need to: Download the Windows ADK Toolkit.

For a quickstart on viewing ONNX Runtime events with the Windows Performance Analyzer (WPA), follow these steps:

  1. Download ort.wprp and etw_provider.wprp.
  2. Open your command line and enter:
wpr -start ort.wprp -start etw_provider.wprp -start NeuralProcessing -start CPU  
echo Repro the issue allowing ONNX to run  
wpr -stop onnx_NPU.etl -compress  
  1. Combine the Windows Performance Recorder (WPR) profiles with other Built-in Recording Profiles such as CPU, Disk, etc.
  2. Download Windows Performance Analyzer (WPA) from the Microsoft Store.
  3. Open the onnx_NPU.etl file in WPA. Double-Click to open these graphs:
    • "Neural Processing -> NPU Utilization
    • Generic Events for ONNX events

Additional performance measurement tools to consider using with the Microsoft Windows tools listed above, include:

Additional Resources