[RFC][Clang][OpenMP] How does Clang handles LLVM translation for allocate clause in taskgroup directive? (original) (raw)
November 15, 2024, 12:58pm 1
I am currently collecting information to provide translation support for the allocate clause in the taskgroup directive in flang-new. As a reference, I analyzed Clang’s behaviour for the allocate clause in the taskgroup, and I do not find any observable difference in the LLVM IR generated by Clang’s compiler with and without the allocate clause.
Analyzing Clang’s behaviour
Allocate directive/clause
Example testcase for allocator clause in allocate directive
// testcase1.c
#include<omp.h>
void test()
{
int x = 0;
#pragma omp allocate(x) allocator(omp_thread_mem_alloc)
{
x = 5;
}
}
- The above example emits the runtime call to
__kmpc_alloc(int gtid, size_t algn, size_t size, omp_allocator_handle_t allocator)
. - Eventhough the call
__kmpc_alloc(...)
passes information about the allocator, it is not allocating the memory for x based on the infomation passed. Instead, it uses default allocator. - Here is the warning thrown from the execution of above code.
OMP: Warning #190: Allocator omp_thread_mem_alloc is not available, will use default allocator.
- Similar behaviour is observed for
#pragma omp parallel allocate(omp_const_mem_alloc:x) reduction(+:x)
as well.
Example testcase for align clause in allocate directive
// testcase2.c
#include<omp.h>
void test()
{
int x = 0;
#pragma omp allocate(x) allocator(omp_const_mem_alloc) align(64)
{
x = 10;
}
}
- The above example emits the runtime call to
__kmpc_aligned_alloc(int gtid, size_t algn, size_t size, omp_allocator_handle_t allocator)
- The alignment information,
align(64)
is used by the runtime call to compute the size for allocation. However, similar to the previous example(testcase1.c), the runtime uses the default allocator and the following warning is displayed:OMP: Warning #190: Allocator omp_const_mem_alloc is not available, will use default allocator.
Allocate clause with taskgroup directive
#include<omp.h>
void test() {
int x = 0;
#pragma omp taskgroup allocate(omp_thread_mem_alloc:x) task_reduction(+:x)
{
#pragma omp task in_reduction(+:x)
{
x = x + 1;
}
}
}
Allocate clause on taskgroup directive:
- The allocation of the private copies for reduction variables are handled by
__kmp_allocate(size_t size)
within the runtime call__kmpc_taskred_init(int gtid, int num, void *data)
which initializes task reduction. - It doesn’t pass the allocator or align information to runtime call
__kmp_allocate(size_t size)
and it allocates memory usingmalloc()
and uses default alignment.
I have some questions to discuss here:
- For Allocate directive with allocator clause or Parallel directive with allocate clause:
- Allocator information is passed to runtime
__kmpc_alloc(int gtid, size_t algn, size_t size, omp_allocator_handle_t allocator)
, but actual allocation happens using default allocator.
* Is there any reason for using default allocator?
* What is the plan for supporting pre-defined allocators(e.g: omp_high_bw_mem_alloc, omp_const_mem_alloc, etc…)?
- Allocator information is passed to runtime
- For Taskgroup with allocate clause
- No allocator or align information is passed to runtime call
__kmpc_taskred_init(int gtid, int num, void *data)
and__kmp_allocate(size_t size)
allocates memory using malloc().
* Is it an intentional design choice for allocate clause in taskgroup?
- No allocator or align information is passed to runtime call
Refer Flang’s RFC for more details.
Kindly provide your suggestions, that would be really helpful for our further implementation in flang.