Overview — tvm 0.23.dev0 documentation (original) (raw)

Apache TVM is a machine learning compilation framework, following the principle of Python-first developmentand universal deployment. It takes in pre-trained machine learning models, compiles and generates deployable modules that can be embedded and run everywhere. Apache TVM also enables customizing optimization processes to introduce new optimizations, libraries, codegen and more.

Key Principle

Key Goals

Key Flow

Here is a typical flow of using TVM to deploy a machine learning model. For a runnable example, please refer to Quick Start

  1. Import/construct an ML model

    TVM supports importing models from various frameworks, such as PyTorch, TensorFlow for generic ML models. Meanwhile, we can create models directly using Relax frontend for scenarios of large language models.

  2. Perform composable optimization transformations via pipelines

    The pipeline encapsulates a collection of transformations to achieve two goals:

    • Graph Optimizations: such as operator fusion, and layout rewrites.
    • Tensor Program Optimization: Map the operators to low-level implementations (both library or codegen)

    Note

    The two are goals but not the stages of the pipeline. The two optimizations are performedat the same level, or separately in two stages.

  3. Build and universal deploy

    Apache TVM aims to provide a universal deployment solution to bring machine learning everywhere with every language with minimum runtime support. TVM runtime can work in non-Python environments, so it works on mobile, edge devices or even bare metal devices. Additionally, TVM runtime comes with native data structures, and can also have zero copy exchange with the existing ecosystem (PyTorch, TensorFlow, TensorRT, etc.) using DLPack support.