What is Windows ML? (original) (raw)

Windows ML is the unified and high-performance local AI inferencing framework for Windows, powered by ONNX Runtime. With Windows ML, you can run AI models locally and accelerate inference on NPUs, GPUs, and CPUs through optional execution providers that Windows manages and keeps up to date. You can use models from PyTorch, TensorFlow/Keras, TFLite, scikit-learn, and other frameworks with Windows ML.

A diagram illustrating an ONNX model going through Windows ML to then reach NPUs, GPUs, and CPUs.

Key benefits

Windows ML makes it straightforward to bring AI inference into any Windows app:

Why use Windows ML instead of Microsoft ORT?

Windows ML is the Windows-supported and maintained copy of ONNX Runtime (ORT), available as a system-wide copy or self-contained:

Additionally, Windows ML allows your app to dynamically acquire the latest execution providers to accelerate your AI models, without carrying the EPs in your app and creating separate builds for different hardware.

See Get started with Windows ML to try it yourself!

Hardware acceleration on NPU, GPU, and CPU

Windows ML lets you access execution providers that can accelerate inference across the three silicon classes present in modern Windows PCs:

For the full silicon-to-EP mapping, driver requirements, and EP sourcing options, see Accelerate AI models.

System requirements

Note

Support for CPU and GPU (via DirectML) is available on all supported Windows versions. Hardware-optimized execution providers for NPUs and specific GPU hardware require Windows 11 version 24H2 (build 26100) or greater. For detail, see Windows ML execution providers.

Performance optimization

The latest version of Windows ML works directly with dedicated execution providers for GPUs and NPUs, delivering to-the-metal performance that's on par with dedicated SDKs of the past such as TensorRT for RTX, AI Engine Direct, and Intel's Extension for PyTorch. We've engineered Windows ML to have best-in-class GPU and NPU performance, without requiring your app to distribute IHV-specific SDKs. Performance results vary by hardware configuration and model — see Accelerate AI models for hardware-specific guidance.

Converting models to ONNX

You can convert models from other formats to ONNX so that you can use them with Windows ML. See the Foundry Toolkit for Visual Studio Code's docs about how to convert models to the ONNX format to learn more. Also see the ONNX Runtime Tutorials for more info on converting PyTorch, TensorFlow, and Hugging Face models to ONNX.

Model distribution

Windows ML provides flexible options for distributing AI models:

Integration with Windows AI ecosystem

Windows ML serves as the foundation for the broader Windows AI platform:

Providing feedback

Found an issue or have suggestions? Search or create issues on the Windows App SDK GitHub.

Next steps