Overview — NVIDIA cuDNN Frontend (original) (raw)

The cuDNN library exposes open-source frontend Python and C++ API layers, which provide a simplified programming model that is sufficient for most use cases. These layers offer all of the graph functionality of the cuDNN backend while adding abstractions and utilities for ease of use. In the frontend API, you can describe multiple operations that form subgraphs through a persistent graph object. You don’t have to worry about specifying the shapes and sizes of the intermediate virtual tensors.

The Python frontend API layer and C++ frontend API layer are functionally equivalent. Therefore, you can choose which API layer to use according to your language preference.

APIs#

The frontend API follows a functional style of building a graph. Operations take in input tensors and return output tensors. This also allows composition of operations.

Creating the Graph#

Instantiate an object of class cudnn_frontend::graph::Graph which will house tensors and operations.

Optional graph level attributes can be set on the object:

These attributes are meant to be used as the default in case they are not provided for constituent tensors and operations.

Defining Tensors#

You can create input tensors to provide operations within a graph. To add tensors in a graph, use:

std::shared_ptr<cudnn_frontend::graph::Tensor_attributes> cudnn_frontend::graph::tensor(cudnn_frontend::graph::Tensor_attributes)

As the API returns a shared pointer, both the user and the frontend graph are owners of the tensor.

Tensor attributes is a lightweight structure with setters for each attribute.

Defining Operations#

Operations take in mandatory input tensors via positional arguments. Optional input tensors are provided using corresponding setters in operation attributes.

Operations return an ordered array of output tensors. Any optional outputs if not present will have their shared pointers pointing to std::nullptr.

Refer to the Operations section for more details.

Validating the Graph#

The validate API ensures the API usage is sound, checks against dangling tensors, and so on. Internally, any unspecified properties like dimensions, strides, and so on, are inferred.

cudnn_frontend::error_t cudnn_frontend::graph::Graph::validate()

Building the Backend Graph#

This method creates the cuDNN backend descriptors for all constituents of the graph.

cudnn_frontend::error_t cudnn_frontend::graph::Graph::build_operation_graph(cudnnHandle_t handle)

Creating the Execution Plan#

This method internally queries the heuristics for engine configs for the given heuristics modes.

cudnn_frontend::error_t cudnn_frontend::graph::Graph::get_execution_plans(std::vector)

Getting the Execution Plan Count#

This method returns the number of execution plans returned by cuDNN heuristics. Each plan gets an index from 0 to #plans-1, with 0 having top priority.

cudnn_frontend::int64_t cudnn_frontend::Graph::get_execution_plan_count() const;

Checking Graph Support#

This method guarantees that executing the graph using plans queried will succeed.

cudnn_frontend::error_t cudnn_frontend::graph::Graph::check_support(cudnnHandle_t h);

Building the Execution Plan#

This function builds execution plans queried with the create_execution_plan(...) API.

There are two flavors of this API:

Filtering Plans (Optional)#

You can filter plans on numerical, behavioral notes, or plans that do not provide the desired functional correctness.

cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&); cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::select_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&);

cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_numeric_notes(std::vector<cudnn_frontend::NumericalNote_t> const&); cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_behavior_notes(std::vector<cudnn_frontend::BehaviorNote_t> const&); cudnn_frontend::graph::Graph& cudnn_frontend::graph::Plans::deselect_workspace_greater_than(int64_t const workspace);

Autotuning#

Autotuning provides a way to execute different execution plans for a given graph and measure your relative performance under runtime conditions. This generally helps validate and improve upon the results provided by the heuristics.

Executing the Graph#

Executing the graph requires device pointers to all input output tensors and a user-allocated device workspace pointer.

Two flavors of execution exist, corresponding to the build_plans(...) API. This API already has a candidate execution plan set.

The candidate execution plan gets internally set when either the:

cudnn_frontend::error_t cudnn_frontend::graph::Graph::execute( cudnnHandle_t handle, std::unordered_map<std::shared_ptr, void > var_pack, void workspace );

The execute API also takes a plan index to target a specific plan. This may be used when autotuning, in conjunction with the build_plan_at_index(...) API.

cudnn_frontend::error_t cudnn_frontend::graph::Graph::execute( cudnnHandle_t handle, std::unordered_map<std::shared_ptr, void > var_pack, void workspace, int64_t plan_index );

Miscellaneous APIs#

Use get_workspace to execute the currently selected execution plan.

You can also take in a plan index to query the workspace for. This may be used when autotuning, in conjunction with the build_plan_at_index(...) API.

int64_t get_workspace_size() const int64_t get_workspace_size_plan_index(int64_t plan_index) const

Use get_autotune_workspace to run autotune on all plans.

get_autotune_workspace_size() const

Serialization#

The frontend API provides two flavors of serialization. One is to checkpoint after the initial graph specification (before calling validate) and other after building the execution plan (to save on plan creation).

The above two APIs are meant to capture the user-specified input tensors and nodes into the graph. This can be used to generate the log (for debugging) or to visualize the graph being created.

A fully built graph can be serialized into a binary blob of data with the above two APIs.

Note

Error Handling#

The C++ API returns an error object which has an error code and an error message.

The Python API throws an exception with a similar error message to be handled in the Python API.