Optimize Inference — OpenVINO™ documentation (original) (raw)

Runtime, or deployment optimization focuses on tuning inference and execution parameters. Unlike model-level optimization, it is highly specific to the hardware you use and the goal you want to achieve. You need to plan whether to prioritize accuracy or performance,throughput or latency, or aim at the golden mean. You should also predict how scalable your application needs to be and how exactly it is going to work with the inference component. This way, you will be able to achieve the best results for your product.

Performance-Portable Inference#

To make configuration easier and performance optimization more portable, OpenVINO offers thePerformance Hints feature. It comprises two high-level “presets” focused on latency (default) or throughput.

Although inference with OpenVINO Runtime can be configured with a multitude of low-level performance settings, it is not recommended, as:

Additional Resources#