OCL-MLA by tuxfan (original) (raw)

What is OCL-MLA?

OCL-MLA is exactly what its name implies: a mid-level set of abstractions to make OpenCL development easier. OCL-MLA provides a set of compile-time configurable logical devices that are mapped to actual node-level device resources. This removes the normal boiler-plate configuration that many people find intimidating and tedious. Logical devices are pre-configured (think MPI_COMM_WORLD communicator) and initialized with a single call to ocl_init(). OCL-MLA insulates the application developer from differences in particular compute devices accessed by the OpenCL runtime, while still allowing an expert OpenCL administrator to choose how each physical device is configured and used. Additionally, OCL-MLA provides a convenience hash-table interface for creating and accessing OpenCL constructs such as kernels, programs and buffers. OCL-MLA supports C and Fortran APIs.

Features

Compile-time logical device configuration
Hash interface for creating and managing OpenCL tokens
Fortran bindings
Timer utilities
Support for multiple OpenCL platforms in single configuration (ICD - installable client driver)
Convenience functions for event manipulation
Utilities for program manipulation, e.g., static compilation of input kernel source code

Example

const size_t ELEMENTS = 32;

int main(int argc, char ** argv) { size_t global_size = ELEMENTS;

// initialize OpenCL runtime ocl_init();

// create a host-side array float h_array[ELEMENTS];

// initialize host-side array for(size_t i=0; i<ELEMENTS; ++i) { h_array[i] = 0.0; } // for

// create a device-side array ocl_create_buffer(OCL_PERFORMANCE_DEVICE, "array", ELEMENTS*sizeof(float), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, h_array);

// create program source from static input string char * source = NULL; ocl_add_from_string(test_PPSTR, &source, 0);

// add program ocl_add_program(OCL_PERFORMANCE_DEVICE, "program", source, "-DMY_DEFINE"); free(source);

// add kernel ocl_add_kernel(OCL_PERFORMANCE_DEVICE, "program", "test", "my test");

// use hints interface to decide what work-group size to use ocl_kernel_hints_t hints; size_t work_group_indeces; size_t single_indeces;

// get kernel hints ocl_kernel_hints(OCL_DEFAULT_DEVICE, "program", "my test", &hints);

// heuristic for how to execute global_size work-items ocl_ndrange_hints(global_size, hints.max_work_group_size, 0.5, 0.5, &local_size, &work_group_indeces, &single_indeces);

// set kenerl argument ocl_set_kernel_arg_buffer("program", "my test", "array", 0);

// initialize event for timings ocl_initialize_event(&event);

// invoke kernel ocl_enqueue_kernel_ndrange(OCL_PERFORMANCE_DEVICE, "program", "my test", 1, &global_offset, &global_size, &local_size, &event);

// block for kernel completion ocl_finish(OCL_PERFORMANCE_DEVICE);

// add a timer event for the kernel invocation ocl_add_timer("kernel", &event);

// read data from device ocl_enqueue_read_buffer(OCL_PERFORMANCE_DEVICE, "array", 1, offset, ELEMENTS*sizeof(float), h_array, &event);

// print data read from device for(size_t i=0; i<ELEMENTS; ++i) { fprintf(stderr, "%f\n", h_array[i]); } // for fprintf(stderr, "\n");

// print timer results ocl_report_timer("kernel");

// finalize OpenCL runtime ocl_finalize(); }