Integrate External/Custom Code - MATLAB & Simulink (original) (raw)

This example shows how to integrate external or custom code to enhance performance of generated code. AlthoughMATLAB® Coder™ generates optimized code for most applications, you might have custom code optimized for your specific requirements. For example:

You have custom libraries optimized for your target environment.
You have custom libraries for functions not supported by MATLAB Coder.
You have custom libraries that meet standards set by your company.

In such cases, you can integrate your custom code with the code generated by MATLAB Coder.

This example illustrates how to integrate the functioncublasSgemm from the NVIDIA® CUDA® Basic Linear Algebra Subroutines (CUBLAS) library in generated code. This function performs matrix multiplication on a Graphics Processing Unit (GPU).

Define a class ExternalLib_API that derives from the class coder.ExternalDependency.ExternalLib_API defines an interface to theCUBLAS library through the following methods:

getDescriptiveName: Returns a descriptive name for ExternalLib_API to be used for error messages.
isSupportedContext: Determines if the build context supports the CUBLAS library.
updateBuildInfo: Adds header file paths and link files to the build information.
GPU_MatrixMultiply: Defines the interface to the CUBLAS library functioncublasSgemm.
ExternalLib_API.m
classdef ExternalLib_API < coder.ExternalDependency
%#codegen

methods (Static)

 function bName = getDescriptiveName(~)  
     bName = 'ExternalLib_API';  
 end  
   
 function tf = isSupportedContext(ctx)  
     if  ctx.isMatlabHostTarget()  
         tf = true;  
     else  
         error('CUBLAS library not available for this target');  
     end  
 end  
   
 function updateBuildInfo(buildInfo, ctx)  
     [~, linkLibExt, ~, ~] = ctx.getStdLibInfo();  
       
     % Include header file path  
     % Include header files later using coder.cinclude  
     hdrFilePath = 'C:\My_Includes';  
     buildInfo.addIncludePaths(hdrFilePath);  
       
     % Include link files  
     linkFiles = strcat('libcublas', linkLibExt);  
     linkPath = 'C:\My_Libs';  
     linkPriority = '';  
     linkPrecompiled = true;  
     linkLinkOnly = true;  
     group = '';  
     buildInfo.addLinkObjects(linkFiles, linkPath, ...  
         linkPriority, linkPrecompiled, linkLinkOnly, group);  
       
     linkFiles = strcat('libcudart', linkLibExt);  
     buildInfo.addLinkObjects(linkFiles, linkPath, ...  
         linkPriority, linkPrecompiled, linkLinkOnly, group);  
       
 end  
   
 %API for library function 'cuda_MatrixMultiply'  
 function C = GPU_MatrixMultiply(A, B)  
     assert(isa(A,'single'), 'A must be single.');  
     assert(isa(B,'single'), 'B must be single.');  
       
     if(coder.target('MATLAB'))  
         C=A*B;  
     else  
           
         % Include header files  
         %     for external functions and typedefs  
         % Header path included earlier using updateBuildInfo  
         coder.cinclude('"cuda_runtime.h"');  
         coder.cinclude('"cublas_v2.h"');  
           
         % Compute dimensions of input matrices  
         m = int32(size(A, 1));  
         k = int32(size(A, 2));  
         n = int32(size(B, 2));  
           
         % Declare pointers to matrices on destination GPU  
         d_A = coder.opaque('float*');  
         d_B = coder.opaque('float*');  
         d_C = coder.opaque('float*');  
           
         % Compute memory to be allocated for matrices  
         % Single = 4 bytes  
         size_A = m*k*4;  
         size_B = k*n*4;  
         size_C = m*n*4;  
           
         % Define error variables  
         error = coder.opaque('cudaError_t');  
         cudaSuccessV = coder.opaque('cudaError_t', ...  
             'cudaSuccess');  
           
         % Assign memory on destination GPU  
         error = coder.ceval('cudaMalloc', ...  
             coder.wref(d_A), size_A);  
         assert(error == cudaSuccessV, ...  
             'cudaMalloc(A) failed');  
         error = coder.ceval('cudaMalloc', ...  
             coder.wref(d_B), size_B);  
         assert(error == cudaSuccessV, ...  
             'cudaMalloc(B) failed');  
         error = coder.ceval('cudaMalloc', ...  
             coder.wref(d_C), size_C);  
         assert(error == cudaSuccessV, ...  
             'cudaMalloc(C) failed');  
           
         % Define direction of copying  
         hostToDevice = coder.opaque('cudaMemcpyKind', ...  
             'cudaMemcpyHostToDevice');  
           
         % Copy matrices to destination GPU  
         error = coder.ceval('cudaMemcpy',  ...  
             d_A, coder.rref(A), size_A, hostToDevice);  
         assert(error == cudaSuccessV, 'cudaMemcpy(A) failed');  
           
         error = coder.ceval('cudaMemcpy',  ...  
             d_B, coder.rref(B), size_B, hostToDevice);  
         assert(error == cudaSuccessV, 'cudaMemcpy(B) failed');  
           
         % Define type and size for result  
         C = zeros(m, n, 'single');  
           
         error = coder.ceval('cudaMemcpy', ...  
             d_C, coder.rref(C), size_C, hostToDevice);  
         assert(error == cudaSuccessV, 'cudaMemcpy(C) failed');  
           
         % Define handle variables for external library  
         handle = coder.opaque('cublasHandle_t');  
         blasSuccess = coder.opaque('cublasStatus_t', ...  
             'CUBLAS_STATUS_SUCCESS');  
           
         % Initialize external library  
         ret = coder.opaque('cublasStatus_t');  
         ret = coder.ceval('cublasCreate', coder.wref(handle));  
         assert(ret == blasSuccess, 'cublasCreate failed');  
           
          
         TRANSA = coder.opaque('cublasOperation_t', ...  
             'CUBLAS_OP_N');  
         alpha = single(1);  
         beta = single(0);  
           
         % Multiply matrices on GPU  
         ret = coder.ceval('cublasSgemm', handle, ...  
             TRANSA,TRANSA,m,n,k, ...  
             coder.rref(alpha),d_A,m, ...  
             d_B,k, ...  
             coder.rref(beta),d_C,k);  
           
         assert(ret == blasSuccess, 'cublasSgemm failed');  
           
         % Copy result back to local host  
         deviceToHost = coder.opaque('cudaMemcpyKind', ...  
             'cudaMemcpyDeviceToHost');  
         error = coder.ceval('cudaMemcpy', coder.wref(C), ...  
             d_C, size_C, deviceToHost);  
         assert(error == cudaSuccessV, 'cudaMemcpy(C) failed');  
           
     end  
 end

end

end 2. To perform the matrix multiplication using the interface defined in method GPU_MatrixMultiply and the build information in ExternalLib_API, include the following line in your MATLAB code:
C= ExternalLib_API.GPU_MatrixMultiply(A,B);
For instance, you can define a MATLAB function Matrix_Multiply that solely performs this matrix multiplication.
function C = Matrix_Multiply(A, B) %#codegen
C= ExternalLib_API.GPU_MatrixMultiply(A,B); 3. Define a MEX configuration object usingcoder.config. For using theCUBLAS libraries, set the target language for code generation to C++.
cfg=coder.config('mex');
cfg.TargetLang='C++'; 4. Generate code for Matrix_Multiply usingcfg as the configuration object and two2 X 2 matrices of type single as arguments. Since cublasSgemm supports matrix multiplication for data type float, the corresponding MATLAB matrices must have type single.
codegen -config cfg Matrix_Multiply ...
-args {ones(2,'single'),ones(2,'single')} 5. Test the generated MEX functionMatrix_Multiply_mex using two 2 X 2 identity matrices of typesingle.
Matrix_Multiply_mex(eye(2,'single'),eye(2,'single'))
The output is also a 2 X 2 identity matrix.

Integrate External/Custom Code - MATLAB & Simulink (original) (raw)

See Also

Related Topics