Features/CPUModels - QEMU (original) (raw)

OUTDATED PAGE

THIS PAGE IS OUTDATED. It included information about the plans for CPU model interfaces a few years ago.

Some history can be read at: https://habkost.net/posts/2017/03/qemu-cpu-model-probing-story.html

Summary

Presentation about the CPU model work on DevConf 2014: File:Cpu-models-and-libvirt-devconf-2014.pdf

This page was about the feature of "externally-configurable" CPU models, but its scope was gradually changed to discussion about the design of the CPU code, the CPU model system. The old "cpudef" config section was deprecated, so the original description doesn't apply anymore.

Owner

Name: Eduardo Habkost
Email: ehabkost@redhat.com

Roadmap

Allow changing of Hypervisor CPUIDs (Don Slutz)

Already done

QEMU 2.9

CPU model probing was implemented through the query-cpu-model-* QMP commands

After QEMU 1.5

x86 CPU properties (Igor Mammedov). DONE
machine-friendly error reporting of -cpu enforce/check. OBSOLETED
- Obsoleted by query-cpu-model-expansion QMP commands and filtered-features property
x86 CPU model subclasses. DONE

QEMU 1.5

CPU feature words refactor
(equivalent to) machine-friendly reporting of -cpu enforce/check
- Actually, the new mechanism is based on the "filtered-features" X86CPU property
Probing for CPU features supported by the host and can be enabled
- Using "-cpu host" and the "feature-words" property
Probing for the features that are actually enabled on each CPU model
- Using "feature-words" and "filtered-features" property

QEMU 1.4

Make CPU a subclass of DeviceState (included)
APIC-ID-related topology fixes (ehabkost) (RFC submitted)
Fixes for -cpu enforce flag

Before QEMU 1.4

Drop "-cpu ?dump" (Peter Maydell)
Move CPU models to C code (ehabkost)
Eliminate cpudef config section support (ehabkost)
"unduplicate feature names" series (ehabkost)
-cpu host use GET_SUPPORTED_CPUID (ehabkost)
add feature flag name list for CPUID 7

Interfaces/requirements for libvirt

Ensuring predictable set of guest features

Requirement: libvirt needs to ensure all features required on the command-line are present and exposed to the guest.

Current problem: libvirt doesn't use the "enforce" flag so it can't guarantee that a given feature will be actually exposed to the guest.

Old solution: use the "enforce" flag on the "-cpu" option.

Limitation: no proper machine-friendly interface to report which features are missing.

Workaround: See "querying for host capabilities" below.

New solution in 1.5: check if "filtered-features" property on CPU object is all zeroes.

Listing CPU models

Requirement: libvirt needs to know which CPU models are available to be used with the "-cpu" option.

Current solution: libvirt uses QMP query-cpu-definitions command.

Limitation: needs a live QEMU process for the query.

Limitation: it can only list CPU model names and nothing else. See "Getting information about CPU models" section.

Proposed solution (TODO): use QMP qom-list-types command.

Dependency: X86CPU subclasses.

Limitation: needs a live QEMU process for the query.

Example: { "execute": "qom-list-types", "arguments": { "implements": "cpu", "abstract": false } }

Caveat: the CPU class name for -cpu _model_ will in the format _model_-_arch_-cpu or _model_-kvm-_arch_-cpu.

Requirements: CPU class/model list should not depend on any other command-line option (e.g. not depend on machine-type)

Unanswered question: we may have separated subclasses for KVM and TCG CPU models.

Future plans

Would be interesting to get rid of the requirement for a live QEMU process to be started, just to list CPU models?

Getting information about CPU models

Requirement: libvirt uses the predefined CPU models from QEMU, but it needs to be able to query for CPU model details, to find out how it can create a VM that matches what was requested by the user.

Current problem: libvirt has a copy of the CPU model definitions on its cpu_map.xml file, and the copy can be out of sync in case CPU models in QEMU change. libvirt also assumes that the set of features on each model is always the same on all machine-types, which is not true.

Benefits of changing: cpu_map.xml and QEMU won't need to match exactly, anymore. The definitions exposed by libvirt could be completely different from the definitions in QEMU, as long as libvirt probes for CPU model information and uses the right flags in the command-line to make QEMU expose what libvirt users expect.

Challenge: the resulting CPU features depend on many factors:

Chosen CPU model name (of course)
machine-type
Host CPU vendor (unless explicit "vendor" option is used)
~~accel=kvm option (CPU models are different in TCG and KVM models)~~ (we are going to make TCG and KVM behave the same)
~~Host CPU capabilities~~ (not valid anymore, as long as "enforce" is used)
~~Host kernel capabilities~~ (not valid anymore, as long as "enforce" is used)
~~kernel-irqchip option~~ (not valid anymore, as long as "enforce" is used)

Proposed Solution (TODO): start a paused VM with no devices, but with the right machine-type and right CPU model. Use QMP QOM commands to query for CPU flags (especially the properties starting with the "f-" prefix).

Dependency: X86CPU feature properties ("f-*" properties).

Limitation: requires a live QEMU process with the right machine-type/CPU-model to be started, to make the query.

Limitation: requires starting a new QEMU process for each machine-type/CPU-model pair that is going to be queried.

Alternative solution: "feature-words" property

Problem: qemu -machine _machine_ -cpu _model_ will create CPU objects where the CPU features are already filtered based on host capabilities.

Using "enforce" wouldn't solve it, because then QEMU would abort, and QMP would be unavailable.
Using "check" wouldn't solve it either, because the features are always filtered out when the CPU is created.

Solution: "filtered-features" property

Requirement: the resulting CPU features for a given host-CPU-vendor + machine-type + CPU-model combination must not ever change, on any future QEMU version.

This should allow libvirt to safely cache CPU model data, even if the QEMU binary changes.

Requirement: libvirt needs to know if a specific CPU model can be used in the current host.

See "Ensuring predictable set of guest features" above

See "Querying host capabilities" below

Solution in 1.5: "feature-words" and "filtered-features" X86CPU properties

Note: libvirt must combine both properties to find out the full CPU model definition. "feature-words" will always be filtered out based on host capabilities

Querying host capabilities

Requirement: libvirt needs to know which feature can really be enabled, before it tries to start a VM, and before it tries to start a live-migration process.

The set of available capabilities depend on:

Host CPU (hardware) capabilities;
Kernel capabilities (reported by GET_SUPPORTED_CPUID);
QEMU capabilities;
Specific configuration options (e.g. in-kernel IRQ chip is required for some features).

Current problem: libvirt uses the CPUID intruction directly and assumes that the presence of a feature in the host CPU means it can be enabled and exposed to the guest. This breaks when virtualization of a feature requires:

Additional hardware support (e.g. INVPCID);
Additional host kernel code (this applies to _all_ CPU features, that need to be reported as supported by GET_SUPPORTED_CPUID);
Additional QEMU-side code;
Specific configuration options
- kernel-irqchip (affects tsc-deadline and x2apic availability)
- machine-type
- NOTE: any other option that affects CPU feature availability, MUST:
  * have defaults depending on machine-type, so libvirt versions that don't know about the new option will still work because they already check machine-type
  * be documented as affecting availability of CPU features, so once libvirt starts setting the option explicitly, it will take it into account when probing for host capabilities

Challenge: QEMU doesn't have a generic capability-querying interface, and host capability querying depends on KVM to be initialized.

Workaround: start a paused VM using the "host" CPU model, that has every single CPU feature supported by the host enabled by default, and query for the information about the CPU though QMP, using the QOM commands.

Solution available in 15: start a paused VM with no devices with the "host" CPU model and check the "feature-words" property of the X86CPU object

Expectation: "filtered-features" should be always all-zeroes when using "-cpu host". If it is not, it is a QEMU bug

Problem: libvirt shouldn't be running QEMU multiple times on initialization, for every QEMU binary. libvirt runs QEMU once, already, but when running it, it doesn't know if KVM (and the "host" CPU model) is going to be available, and it is run using "-machine none".

Proposed solution: we should make classes for each CPU model, libvirt could start using "-machine none" and create a new "host-x86-cpu" object via QMP.

Requirement: "device_add host-x86-cpu" should work even if using "-machine none"

Requirement: "device_add host-x86-cpu" should make the "feature-words" property (and the future "f-*" properties) be filled correctly.

Proposed solution (TODO): start a paused VM with no devices but with "host" CPU model and use QMP QOM commands to query for "f-*" feature properties

Dependency: X86CPU feature properties

Getting level/xlevel/xlevel2 set properly

Fact: libvirt sometimes adds features based on host capabilities, and this often generates "-cpu ExistingModel,+feature,+feature2,+feature3" command-line options.

Problem: sometimes using "+feature" won't work if other fields need to be set for the feature to work.

Proposed solution: "level" and "xlevel" should be increased automatically if a feature requires it to be set to a higher value, unless it has an override value set on the command-line.

Disabling features that were always disabled on KVM

Challenge: existing configurations may be already broken (people may be using a CPU model, getting some features filtered out silently, and not want their existing configuration to break).

Example: the "monitor" feature was never supported by KVM, but it is included in many CPU models.

Proposed solution: If libvirt wants to keep existing VMs using (e.g.) "core2duo" working and not break guest ABI, it will need to use "-cpu core2duo,-monitor", to keep guest ABI.

Note: Ignoring "monitor" when checking the "filtered-features" property won't be enough, because newer kernels may really support the "monitor" flag, and on those cases, I assume we want to keep it disabled to maintain guest ABI.

Example 2: the "rdtscp" flag

Fact: on AMD hosts, exposing rdtscp was never supported by KVM

Fact: TCG supports rdtscp, so the AMD CPU models do have rdtscp enabled in QEMU

Assumption: we don't want CPU model definitions to look different in KVM and TCG mode, to keep the rules of the QEMU<->libvirt interfaces simpler

Fact: currently libvirt runs CPU models having rdtscp without the "enforce" flag, and rdtscp is silently disabled

Consequence: libvirt SHOULD use something like "-cpu Opteron_G5,-rdtscp", especially when it starts using (or emulating) enforce mode

This will require a solution on libvirt side. QEMU will just provide the mechanisms to report CPU model information and check what the host and QEMU supports, but the decision to disable rdtscp to be able to run Opteron_G[2345] needs to be taken by libvirt.

Solved challenges

Allowing CPU models to be updated

We need a mechanism to allow the existing CPU models on Qemu to be updated, without making guest-visible changes for existing Virtual Machines, when migrating to a new version.

Examples

Examples where CPU model updates are necessary and have to be deployed to users:

The Nehalem CPU model currently has the wrong "level" value, making CPU topology information unavailable.
The CPUID PMU leaf was added on Qemu 1.1, but it is not supposed to be visible to guests running using -M pc-1.0
New features are implemented by KVM and we may want to add them to existing models (e.g. SandyBridge may need to have tsc-deadline added)

Requirements

A different CPU will be visible to the guest depending on the machine-type chosen.
- That means that "-M pc-1.0 -cpu Nehalem" will be different from "-M pc-1.1 -cpu Nehalem"
- Rationale:
  * The meaning of "-M pc-1.0 -cpu Nehalem" can't be changed or it will change existing guests
  * The meaning of "-M pc-1.1 -cpu Nehalem" needs to be different from the pc-1.0 one, otherwise we would be stuck with a broken "Nehalem" model forever

Status/solution

CPU model definitions were moved to C code, so we can easily add compatibility code to them if necessary
CPUs are now DeviceState objects
CPU models will become separate classes, so per-CPU-model compatibility properties can be used on machine-type definitions

`-cpu host` and feature probing

See http://article.gmane.org/gmane.comp.emulators.kvm.devel/90035

`-cpu host` vs `-cpu best`

Currently we have -cpu host, but the naming and semantics are unclear.

We have 3 possible modes of "try to get the best CPU model":

all-you-can-enable: Enable every single bit that can be enabled, including the ones not present on the host but that can be emulated.
match-host-CPU: Enable all bits that are present in the host CPU that can be enabled.
best-predefined-model: Use the best CPU model available from the pre-defined CPU model list.

Status

-cpu host will be the "all-you-can-enable" mode, that will enable every bit from GET_SUPPORTED_CPUID on the VCPU
We're not going to have a mode for match-host-CPU, probably
A "best-predefined-model" mode can be implemented by libvirt.

Moving CPU model definitions to C code

The old "cpudef" config section was deprecated because there are expectations that QEMU is going to provide the CPU model list, and will keep migration compatibility using machine-types. Machine-type compatibility code is incide QEMU C code, so making external config files depend and/or be affected by internal QEMU C code would be confusing and fragile. Now both CPU model definitions and per-machine-type CPU-model compatibility code are inside the QEMU C code.

check/enforce flags

The pseudo CPUID flag 'check' when appearing in the command line feature flag list will warn when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest:

qemu-system-x86_64 ... -cpu Nehalem,check

warning: host cpuid 0000_0001 lacks requested flag 'sse4.2|sse4_2' [0x00100000] warning: host cpuid 0000_0001 lacks requested flag 'popcnt' [0x00800000]

A similar 'enforce' pseudo flag exists which in addition to the above causes qemu to error exit if requested flags are unavailable.