Memory allocation (GNU libgomp) (original) (raw)
11.3 Memory allocation ¶
The description below applies to:
- Explicit use of the OpenMP API routines, seeMemory Management Routines.
- The
allocate
clause, except when theallocator
modifier is a constant expression with valueomp_default_mem_alloc
and noalign
modifier has been specified. (In that case, the normalmalloc
allocation is used.) - The
allocate
directive for variables in static memory; while the alignment is honored, the normal static memory is used. - Using the
allocate
directive for automatic/stack variables, except when theallocator
clause is a constant expression with valueomp_default_mem_alloc
and noalign
clause has been specified. (In that case, the normal allocation is used: stack allocation and, sometimes for Fortran, alsomalloc
[depending on flags such as-fstack-arrays].) - In Fortran, the
allocators
directive and the executableallocate
directive for Fortran pointers and allocatables is supported, but requires that files containing those directives has to be compiled with -fopenmp-allocators. Additionally, all files that might explicitly or implicitly deallocate memory allocated that way must also be compiled with that option. - The used alignment is the maximum of the value the
align
clause and the alignment of the type after honoring, if present, thealigned
(GNU::aligned
) attribute and C’s_Alignas
and C++’salignas
. However, thealign
clause of theallocate
directive has no effect on the value of C’s_Alignof
and C++’salignof
.
GCC supports the following predefined allocators and predefined memory spaces:
Predefined allocators | Associated predefined memory spaces |
---|---|
omp_default_mem_alloc | omp_default_mem_space |
omp_large_cap_mem_alloc | omp_large_cap_mem_space |
omp_const_mem_alloc | omp_const_mem_space |
omp_high_bw_mem_alloc | omp_high_bw_mem_space |
omp_low_lat_mem_alloc | omp_low_lat_mem_space |
omp_cgroup_mem_alloc | omp_low_lat_mem_space (implementation defined) |
omp_pteam_mem_alloc | omp_low_lat_mem_space (implementation defined) |
omp_thread_mem_alloc | omp_low_lat_mem_space (implementation defined) |
ompx_gnu_pinned_mem_alloc | omp_default_mem_space (GNU extension) |
Each predefined allocator, including omp_null_allocator
, has a corresponding allocator class template that meet the C++ allocator completeness requirements. These are located in the omp::allocator
namespace, and theompx::allocator
namespace for gnu extensions. This allows the allocator-aware C++ standard library containers to use OpenMP allocation routines; for instance:
std::vector<int, omp::allocator::cgroup_mem> vec;
The following allocator templates are supported:
Predefined allocators | Associated allocator template |
---|---|
omp_null_allocator | omp::allocator::null_allocator |
omp_default_mem_alloc | omp::allocator::default_mem |
omp_large_cap_mem_alloc | omp::allocator::large_cap_mem |
omp_const_mem_alloc | omp::allocator::const_mem |
omp_high_bw_mem_alloc | omp::allocator::high_bw_mem |
omp_low_lat_mem_alloc | omp::allocator::low_lat_mem |
omp_cgroup_mem_alloc | omp::allocator::cgroup_mem |
omp_pteam_mem_alloc | omp::allocator::pteam_mem |
omp_thread_mem_alloc | omp::allocator::thread_mem |
ompx_gnu_pinned_mem_alloc | ompx::allocator::gnu_pinned_mem |
The following traits are available when constructing a new allocator; if a trait is not specified or with the value default
, the specified default value is used for that trait. The predefined allocators use the default values of each trait, except that theomp_cgroup_mem_alloc
, omp_pteam_mem_alloc
, andomp_thread_mem_alloc
allocators have the access
trait set to cgroup
, pteam
, and thread
, respectively. For each trait, a named constant prefixed by omp_atk_
exists; for each non-numeric value, a named constant prefixed by omp_atv_
exists.
Trait | Allowed values | Default value |
---|---|---|
sync_hint | contended, uncontended,serialized, private | contended |
alignment | Positive integer being a power of two | 1 byte |
access | all, cgroup,pteam, thread | all |
pool_size | Positive integer (bytes) | See below. |
fallback | default_mem_fb, null_fb,abort_fb, allocator_fb | See below |
fb_data | allocator handle | (none) |
pinned | true, false | See below |
partition | environment, nearest,blocked, interleaved | environment |
For the fallback
trait, the default value is null_fb
for theomp_default_mem_alloc
allocator and any allocator that is associated with device memory; for all other allocators, it is default_mem_fb
by default.
For the pinned
trait, the default value is true
for predefined allocator ompx_gnu_pinned_mem_alloc
(a GNU extension), andfalse
for all others.
The following description applies to the initial device (the host) and largely also to non-host devices; for the latter, also see Offload-Target Specifics.
For the memory spaces, the following applies:
omp_default_mem_space
is supportedomp_const_mem_space
maps toomp_default_mem_space
omp_low_lat_mem_space
is only available on supported devices, and maps toomp_default_mem_space
otherwise.omp_large_cap_mem_space
maps toomp_default_mem_space
, unless the memkind library is availableomp_high_bw_mem_space
maps toomp_default_mem_space
, unless the memkind library is available
On Linux systems, where the memkind library (libmemkind.so.0
) is available at runtime and the respective memkind kind is supported, it is used when creating memory allocators requesting
- the
partition
traitinterleaved
except when the memory space isomp_large_cap_mem_space
(usesMEMKIND_HBW_INTERLEAVE
) - the memory space is
omp_high_bw_mem_space
(usesMEMKIND_HBW_PREFERRED
) - the memory space is
omp_large_cap_mem_space
(usesMEMKIND_DAX_KMEM_ALL
or, if not available,MEMKIND_DAX_KMEM
)
On Linux systems, where the numa library (libnuma.so.1
) is available at runtime, it used when creating memory allocators requesting
- the
partition
traitnearest
, except when both the libmemkind library is available and the memory space is eitheromp_large_cap_mem_space
oromp_high_bw_mem_space
Note that the numa library will round up the allocation size to a multiple of the system page size; therefore, consider using it only with large data or by sharing allocations via the pool_size
trait. Furthermore, the Linux kernel does not guarantee that an allocation will always be on the nearest NUMA node nor that after reallocation the same node will be used. Note additionally that, on Linux, the default setting of the memory placement policy is to use the current node; therefore, unless the memory placement policy has been overridden, the partition
trait environment
(the default) will be effectively a nearest
allocation.
Additional notes regarding the traits:
- The
pinned
trait is supported on Linux hosts, but is subject to the OSulimit
/rlimit
locked memory settings. It currently usesmmap
and is therefore optimized for few allocations, including large data. If the conditions for numa or memkind allocations are fulfilled, those allocators are used instead. - The default for the
pool_size
trait is no pool and for every (re)allocation the associated library routine is called, which might internally use a memory pool. Currently, the same applies when apool_size
has been specified, except that once allocations exceed the the pool size, the action of thefallback
trait applies. - For the
partition
trait, the partition part size will be the same as the requested size (i.e.interleaved
orblocked
has no effect), except forinterleaved
when the memkind library is available. Furthermore, fornearest
and unless the numa library is available, the memory might not be on the same NUMA node as thread that allocated the memory; on Linux, this is in particular the case when the memory placement policy is set to preferred. - The
access
trait has no effect such that memory is always accessible by all threads. (Except on supported no-host devices.) - The
sync_hint
trait has no effect.
See also:Offload-Target Specifics