OpenACC Profiling Interface (GNU libgomp) (original) (raw)

10.1 Implementation Status and Implementation-Defined Behavior

We’re implementing the OpenACC Profiling Interface as defined by the OpenACC 2.6 specification. We’re clarifying some aspects here as_implementation-defined behavior_, while they’re still under discussion within the OpenACC Technical Committee.

This implementation is tuned to keep the performance impact as low as possible for the (very common) case that the Profiling Interface is not enabled. This is relevant, as the Profiling Interface affects all the hot code paths (in the target code, not in the offloaded code). Users of the OpenACC Profiling Interface can be expected to understand that performance is impacted to some degree once the Profiling Interface is enabled: for example, because of the_runtime_ (libgomp) calling into a third-party library for every event that has been registered.

We’re not yet accounting for the fact that OpenACC events may occur during event processing. We just handle one case specially, as required by CUDA 9.0nvprof, that acc_get_device_type(acc_get_device_type – Get type of device accelerator to be used.)) may be called fromacc_ev_device_init_start, acc_ev_device_init_endcallbacks.

We’re not yet implementing initialization via aacc_register_library function that is either statically linked in, or dynamically via LD_PRELOAD. Initialization via acc_register_library functions dynamically loaded via the ACC_PROFLIB environment variable does work, as does directly calling acc_prof_register,acc_prof_unregister, acc_prof_lookup.

As currently there are no inquiry functions defined, calls toacc_prof_lookup always returns NULL.

There aren’t separate start, stop events defined for the event types acc_ev_create, acc_ev_delete,acc_ev_alloc, acc_ev_free. It’s not clear if these should be triggered before or after the actual device-specific call is made. We trigger them after.

Remarks about data provided to callbacks:

acc_prof_info.event_type

It’s not clear if for nested event callbacks (for example,acc_ev_enqueue_launch_start as part of a parent compute construct), this should be set for the nested event (acc_ev_enqueue_launch_start), or if the value of the parent construct should remain (acc_ev_compute_construct_start). In this implementation, the value generally corresponds to the innermost nested event type.

acc_prof_info.device_type

acc_prof_info.thread_id

Always -1; not yet implemented.

acc_prof_info.async

acc_prof_info.async_queue

There is no limited number of asynchronous queues in libgomp. This always has the same value as acc_prof_info.async.

acc_prof_info.src_file

Always NULL; not yet implemented.

acc_prof_info.func_name

Always NULL; not yet implemented.

acc_prof_info.line_no

Always -1; not yet implemented.

acc_prof_info.end_line_no

Always -1; not yet implemented.

acc_prof_info.func_line_no

Always -1; not yet implemented.

acc_prof_info.func_end_line_no

Always -1; not yet implemented.

acc_event_info.event_type, acc_event_info.*.event_type

Relating to acc_prof_info.event_type discussed above, in this implementation, this will always be the same value asacc_prof_info.event_type.

acc_event_info.*.parent_construct

acc_event_info.*.implicit

For acc_ev_alloc, acc_ev_free,acc_ev_enqueue_upload_start, acc_ev_enqueue_upload_end,acc_ev_enqueue_download_start, andacc_ev_enqueue_download_end, this currently will be 1also for explicit usage.

acc_event_info.data_event.var_name

Always NULL; not yet implemented.

acc_event_info.data_event.host_ptr

For acc_ev_alloc, and acc_ev_free, this is alwaysNULL.

typedef union acc_api_info

… as printed in 5.2.3. Third Argument: API-Specific Information. This should obviously be typedef _struct_acc_api_info.

acc_api_info.device_api

Possibly not yet implemented correctly foracc_ev_compute_construct_start,acc_ev_device_init_start, acc_ev_device_init_end: will always be acc_device_api_none for these event types. For acc_ev_enter_data_start, it will beacc_device_api_none in some cases.

acc_api_info.device_type

Always the same as acc_prof_info.device_type.

acc_api_info.vendor

Always -1; not yet implemented.

acc_api_info.device_handle

Always NULL; not yet implemented.

acc_api_info.context_handle

Always NULL; not yet implemented.

acc_api_info.async_handle

Always NULL; not yet implemented.

Remarks about certain event types:

acc_ev_device_init_start, acc_ev_device_init_end

acc_ev_enter_data_start, acc_ev_enter_data_end, acc_ev_exit_data_start, acc_ev_exit_data_end

Callbacks for the following event types will be invoked, but dispatch and information provided therein has not yet been thoroughly reviewed:

During device initialization, and finalization, respectively, callbacks for the following event types will not yet be invoked:

Callbacks for the following event types have not yet been implemented, so currently won’t be invoked:

For the following runtime library functions, not all expected callbacks will be invoked (mostly concerning implicit device initialization):

Aside from implicit device initialization, for the following runtime library functions, no callbacks will be invoked for shared-memory offloading devices (it’s not clear if they should be):