Performance Information F.A.Q. — OpenVINO™ documentation (original) (raw)

New performance benchmarks are typically published on everymajor.minor release of the Intel® Distribution of OpenVINO™ toolkit.

All models used are published on Hugging Face.

The models used in the performance benchmarks were chosen based on general adoption and usage in deployment scenarios. New models that support a diverse set of workloads and usage are added periodically.

All of the performance benchmarks on traditional network models are generated using the open-source tool within the Intel® Distribution of OpenVINO™ toolkit called benchmark_app.

For diffusers (Stable-Diffusion) and foundational models (aka LLMs) please use the OpenVINO GenAI opensource repo OpenVINO GenAI tools/llm_bench

For a simple instruction on testing performance, see the Getting Performance Numbers Guide.

The image size used in inference depends on the benchmarked network. The table below presents the list of input sizes for each network model and a link to more information on that model:

Model	Public Network	Task	Input Size
DeepSeek-R1-Distill-Llama-8B	DeepSeek, HF	Auto regressive language	128K
DeepSeek-R1-Distill-Qwen-1.5B	DeepSeek, HF	Auto regressive language	128K
Gemma-3-4B-it	Hugginface	Text-To-Text Decoder-only	128K
Llama-2-7b-chat	Meta AI	Auto regressive language	4K
Llama-3-8b	Meta AI	Auto regressive language	4K
Llama-3.2-3B-Instruct	Meta AI	Auto regressive language	128K
Phi4-mini-Instruct	Huggingface	Auto regressive language	4096
Qwen-2-VL-7B-instruct	Huggingface	Auto regressive language	128K
Qwen-3-8B	Huggingface	Auto regressive language	32K
Stable-Diffusion-V1-5	Hugginface	Latent Diffusion Model	77
FLUX.1-schnell	Hugginface	Latent Adversarial Diffusion Distillation Model	256
bert-base-cased	BERT	question / answer	128
Detectron-V2	Detectron-V2	object instance segmentation	800x800
mobilenet-v2	Mobilenet V2 PyTorch	classification	224x224
resnet-50	ResNet-50_v1_ILSVRC-2012	classification	224x224
ssd-resnet34-1200-onnx	ssd-resnet34 onnx model	object detection	1200x1200
yolov11	Yolov11	object detection	640x640

Intel partners with vendors all over the world. For a list of Hardware Manufacturers, see theIntel® AI: In Production Partners & Solutions Catalog. For more details, see the Supported Devices article.

Set of guidelines and recommendations to optimize models are available in theoptimization guide. Join the conversation in the Community Forum for further support.

The benefit of low-precision optimization extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster. Therefore, it offers better throughput on any converted model, regardless of the intrinsically supported low-precision optimizations within Intel® hardware. For comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270, refer to the Model Accuracy for INT8 and FP32 Precision

The website format has changed in order to support more common approach of searching for the performance results of a given neural network model on different HW-platforms. As opposed to reviewing performance of a given HW-platform when working with different neural network models.

Latency is measured by running the OpenVINO™ Runtime in synchronous mode. In this mode, each frame or image is processed through the entire set of stages (pre-processing, inference, post-processing) before the next frame or image is processed. This KPI is relevant for applications where the inference on a single image is required. For example, the analysis of an ultra sound image in a medical application or the analysis of a seismic image in the oil & gas industry. Other use cases include real or near real-time applications, e.g. the response of industrial robot to changes in its environment and obstacle avoidance for autonomous vehicles, where a quick response to the result of the inference is required.