gpucoder.atomicAdd - Atomically add a specified value to a variable in global or shared
memory - MATLAB ([original](https://in.mathworks.com/help/gpucoder/ref/gpucoder.atomicadd.html)) ([raw](?raw))
Atomically add a specified value to a variable in global or shared memory
Since R2021b
Syntax
Description
[A,oldA] = gpucoder.atomicAdd([A](#mw%5F1e4ed75a-3318-4593-a577-e911f63aa943),[B](#mw%5F1e4ed75a-3318-4593-a577-e911f63aa943))
adds B
to the value of A
in global or shared memory and writes the result back into A. The operation is atomic in a sense that the entire read-modify-write operation is guaranteed to be performed without interference from other threads. The order of the input and output arguments must match the syntax provided.
Examples
Perform a simple atomic addition operation by using thegpucoder.atomicAdd
function and generate CUDA® code that calls corresponding CUDAatomicAdd()
APIs.
In one file, write an entry-point function myAtomicAdd
that accepts matrix inputs a
and b
.
function [a,oldA] = myAtomicAdd(a,b)
oldA = coder.nullcopy(a);
for i = 1:size(a,1) for j = 1:size(a,2) for k = 1:size(a,3) [a(i,j,k),oldA(i,j,k)] = gpucoder.atomicAdd(a(i,j,k),b(i,j,k)); end end end
To create a type for a matrix of doubles for use in code generation, use thecoder.newtype function.
A = coder.newtype('single', [30 30 20], [1 0 1]); B = coder.newtype('single', [30 30 20], [1 0 1]); inputArgs = {A,B};
To generate a CUDA library, use the codegen function.
cfg = coder.gpuConfig('lib'); cfg.GenerateReport = true;
codegen -config cfg -args inputArgs myAtomicAdd -d myAtomicAdd
The generated CUDA code contains the myAtomicAdd_kernel1
kernel with calls to the atomicAdd()
CUDA APIs.
// // File: myAtomicAdd.cu // ... static global launch_bounds(1024, 1) void myAtomicAdd_kernel1( const float b_data[], const int b_size[3], int a_size[3], const int oldA_size[3], const int b_a_size, const int i, float oldA_data[], float a_data[]) { unsigned long loopEnd; unsigned long threadId; ...
for (unsigned long idx{threadId}; idx <= loopEnd; idx += threadStride) { unsigned long tmpIndex; int b_i; int j; int k; k = static_cast(idx % (static_cast(b_a_size) + 1UL)); tmpIndex = (idx - static_cast(k)) / (static_cast(b_a_size) + 1UL); j = static_cast(tmpIndex % 30UL); tmpIndex = (tmpIndex - static_cast(j)) / 30UL; b_i = static_cast(tmpIndex); oldA_data[(b_i + oldA_size[0] * j) + oldA_size[0] * 30 * k] = atomicAdd(&a_data[(b_i + a_size[0] * j) + a_size[0] * 30 * k], b_data[(b_i + b_size[0] * j) + b_size[0] * 30 * k]); } } ...
void myAtomicAdd(float a_data[], int a_size[3], const float b_data[], const int b_size[3], float oldA_data[], int oldA_size[3]) { dim3 block; dim3 grid; ...
cudaMemcpy(gpu_a_data, a_data, a_size[0] * (30 * a_size[2]) * sizeof(float),
cudaMemcpyHostToDevice);
myAtomicAdd_kernel1<<<grid, block>>>(gpu_b_data, *gpu_b_size, *gpu_a_size,
*gpu_oldA_size, b_a_size, i,
gpu_oldA_data, gpu_a_data);
oldA_data_dirtyOnGpu = true;
cudaMemcpy(a_data, gpu_a_data, a_size[0] * (30 * a_size[2]) * sizeof(float),
cudaMemcpyDeviceToHost);
} ...
}
Input Arguments
Operands, specified as scalars, vectors, matrices, or multidimensional arrays. Inputs A
and B
must satisfy the following requirements:
- Have the same data type.
- Have the same size or have sizes that are compatible. For example,
A
is anM
-by-N
matrix andB
is a scalar or1
-by-N
row vector. - Requires CUDA device with a minimum compute capability of 6.0 when the data type is
double
.
Data Types: double
| single
| int32
| uint32
| uint64
Version History
Introduced in R2021b
See Also
Functions
- gpucoder.atomicCAS | gpucoder.atomicDec | gpucoder.atomicExch | gpucoder.atomicInc | gpucoder.atomicMax | gpucoder.atomicMin | gpucoder.atomicSub