Get started with Phi Silica in the Windows App SDK (original) (raw)

Phi Silica is a powerful hardware-accelerated local language model that provides many capabilities found in Large Language Models (LLMs). On NPU-equipped devices,the model employs a technique called speculative decoding to accelerate text generation using a smaller draft model that can propose multiple token sequences and be validated in parallel by the main model.

Note

Phi Silica features are not available in China.

Phi Silica is optimized for efficiency and performance on Windows Copilot+ PCs (where it runs on the NPU) and on non-Copilot+ Windows 11 devices with a supported GPU, and can be integrated into your Windows apps through the Windows AI APIs in the Windows App SDK.

This level of optimization is not available in other versions of Phi.

Supported hardware

Phi Silica runs on the following hardware:

Hardware Status Details
NPU (Copilot+ PC) ✅ Available Best performance. See Copilot+ PCs developer guide.
GPU — NVIDIA ✅ Available GeForce RTX 30 series and newer with 6+ GB vRAM.
GPU — AMD 🔜 Coming soon Support for AMD GPUs is planned for a future release.

Important

Running Phi Silica on GPU requires Developer Mode to be enabled. Go to Settings > System > For developers > Developer Mode.

GPU prerequisites:

GPU driver requirements

Running Phi Silica on GPU requires the latest driver installed directly from the GPU manufacturer. Default drivers from Windows Update or OEM installations may not be sufficient and can cause failures or degraded performance.

Download the latest driver for your hardware:

Note

OEM-supplied drivers (delivered through Windows Update or your PC manufacturer's update tool) may overwrite IHV drivers you previously installed. If Phi Silica stops working on GPU after a system update, reinstall the latest driver from the links above.

GPU feature differences

The following features behave differently on GPU compared to NPU:

Model availability and download

Unlike the NPU model — which is pre-installed on Copilot+ PCs — the Phi Silica model for GPU is not pre-installed on the user's device. Instead, the model is downloaded on demand the first time your app calls EnsureReadyAsync. The download is several gigabytes and runs in the background through Windows Update.

Because the Phi Silica GPU model is large, show a confirmation dialog before calling EnsureReadyAsync so the user can consent to both the storage cost and the background download. A typical pattern:

  1. Call GetReadyState and branch on the returned AIFeatureReadyState:
    • Ready — the model is installed; proceed.
    • NotReady or EnsureNeeded — show your consent dialog (see below), then call EnsureReadyAsync only if the user agrees.
    • NotSupportedOnCurrentSystem — the user's hardware does not meet the requirements in Supported hardware. Offer a fallback experience and, when appropriate, surface the hardware requirements so the user can make an informed upgrade decision.
  2. In your consent dialog, explain:
    • An optional language model will be downloaded (several GB of storage).
    • The download happens in the background through Windows Update.
    • The user can monitor download progress at Settings > Windows Update.
    • The user can later remove the model at Settings > System > AI Components if they no longer want it.
      Tip
      In user-facing strings (dialog text, status messages), refer to the model as the "language model" or "optional AI model" rather than "Phi Silica." Most end users aren't familiar with the brand name, and generic terms communicate purpose more clearly.
  3. While EnsureReadyAsync is in progress, show a progress indicator in your app. The returned operation exposes a status option that drives a loading UI; see Get started with Windows AI APIs for details.

After the model is installed

The model remains on the device until the user removes it. Users manage installed models at Settings > System > AI Components, where the Phi Silica GPU model appears as "AI LanguageModel". If the user later removes the model, your app's next call to GetReadyState returns NotReady or EnsureNeeded and the consent + download flow should be repeated.

For API details, see:

Integrate Phi Silica

With a local Phi Silica language model you can generate text responses to user prompts. First, ensure you have the pre-requisites and models available on your device as outlined in Getting Started with Windows AI APIs.

Specify the required namespaces

To use Phi Silica, make sure you are using the required namespaces:

using Microsoft.Windows.AI;
using Microsoft.Windows.AI.Text;
#include "winrt/Microsoft.Windows.AI.Text.h"
using namespace Microsoft::Windows::AI;
using namespace Microsoft::Windows::AI::Text;

Generate a response

This example shows how to generate a response to a Q&A prompt with custom content moderation (see Content Moderation with the Windows AI APIs).

  1. Ensure the language model is available by calling the GetReadyState method and waiting for the EnsureReadyAsync method to return successfully.
  2. Once the language model is available, create a LanguageModel object to reference it.
  3. Submit a string prompt to the model using the GenerateResponseAsync method, which returns the complete result.
if (LanguageModel.GetReadyState() == AIFeatureReadyState.NotReady) 
{ 
   var op = await LanguageModel.EnsureReadyAsync(); 
} 

using LanguageModel languageModel = await LanguageModel.CreateAsync();

string prompt = "Provide the molecular formula for glucose.";

LanguageModelOptions options = new LanguageModelOptions();
ContentFilterOptions filterOptions = new ContentFilterOptions();
filterOptions.PromptMaxAllowedSeverityLevel.Violent = SeverityLevel.Minimum;
options.ContentFilterOptions = filterOptions;

var result = await languageModel.GenerateResponseAsync(prompt, options);
 
Console.WriteLine(result.Text);
if (LanguageModel::GetReadyState() == AIFeatureReadyState::NotReady)
{
    auto op = LanguageModel::EnsureReadyAsync().get();
}

auto languageModel = LanguageModel::CreateAsync().get();

const winrt::hstring prompt = L"Provide the molecular formula for glucose.";

LanguageModelResponseResult result = languageModel.GenerateResponseAsync(prompt).get();
std::cout << result.Text().c_str() << std::endl;

The response generated by this example is:

C6H12O6

Text Intelligence Skills

Phi Silica includes built-in text transformation capabilities (known as Text Intelligence Skills) that can deliver structured, concise, and user-friendly responses through predefined formatting using a local language model.

Supported skills include:

The following steps describe how to use Text Intelligence Skills.

  1. Create a LanguageModel object
    This object references the local Phi Silica language model (remember to confirm that the Phi Silica model is available on the device).
  2. Instantiate the skill-specific object
    Choose the appropriate class based on the skill you want to apply and pass the LanguageModel instance as a parameter.
  3. Call the method to perform the skill
    Each skill exposes an asynchronous method that processes the input and returns a formatted result.
  4. Handle the response
    The result is returned as a typed object, which you can print or log as needed.

This example demonstrates the text summarizing skill.

  1. Create a LanguageModel instance (languageModel).
  2. Pass that LanguageModel to the TextSummarizer constructor.
  3. Pass some text to the SummarizeAsync method and print the result.
using namespace Microsoft.Windows.AI.Text;

using LanguageModel languageModel = await LanguageModel.CreateAsync();

var textSummarizer = new TextSummarizer(languageModel);
string text = @"This is a large amount of text I want to have summarized.";
var result = await textSummarizer.SummarizeAsync(text);

Console.WriteLine(result.Text); 
using namespace Microsoft::Windows::AI::Text;

auto languageModel = LanguageModel::CreateAsync().get();
auto textSummarizer = TextSummarizer(languageModel);
std::string prompt = "This is a large amount of text I want to have summarized.";
auto result = textSummarizer.SummarizeAsync(prompt);

std::wcout << result.get().Text() << std::endl;

Responsible AI

We've followed core principles and practices described in the Microsoft Responsible AI Standards to ensure these APIs are trustworthy, secure, and built responsibly. For more details on implementing AI features in your app, see Responsible Generative AI Development on Windows.

GPU transparency notes

For detailed information about the capabilities, limitations, and responsible use of Phi Silica on non-Copilot+ PCs (GPU), see the Transparency Note: Phi Silica on Non-Copilot+ PCs.

Key differences between NPU and GPU execution:

Factor Copilot+ PCs (NPU) Non-Copilot+ PCs (GPU)
Inference Latency Optimized; low latency via NPU acceleration and speculative decoding Higher latency; depends on GPU generation, VRAM, and current GPU load
Power Consumption NPU is power-efficient, suitable for battery-powered use Higher power consumption; may impact battery life on laptops
Prompt Compression ✅ Available ❌ Not available on GPU
Speculative Decoding ✅ Available ❌ Not available on GPU
Model Optionality Model is managed by the system Model is downloaded on demand and can be removed via Settings > System > AI Components
Operational Factors for Non-Copilot+ PCs

System Performance

Understanding Performance on Non-Copilot+ PCs

Phi Silica's output quality (accuracy, coherence, relevance) is consistent across Copilot+ and non-Copilot+ PCs because the same model weights and architecture are used. The primary differences are in inference speed, resource consumption, and user experience responsiveness.

Developers should:

See also