gpucoder.atomicAdd - Atomically add a specified value to a variable in global or shared

  memory - MATLAB ([original](https://in.mathworks.com/help/gpucoder/ref/gpucoder.atomicadd.html)) ([raw](?raw))

Atomically add a specified value to a variable in global or shared memory

Since R2021b

Syntax

Description

[A,oldA] = gpucoder.atomicAdd([A](#mw%5F1e4ed75a-3318-4593-a577-e911f63aa943),[B](#mw%5F1e4ed75a-3318-4593-a577-e911f63aa943)) adds B to the value of A in global or shared memory and writes the result back into A. The operation is atomic in a sense that the entire read-modify-write operation is guaranteed to be performed without interference from other threads. The order of the input and output arguments must match the syntax provided.

example

Examples

collapse all

Perform a simple atomic addition operation by using thegpucoder.atomicAdd function and generate CUDA® code that calls corresponding CUDAatomicAdd() APIs.

In one file, write an entry-point function myAtomicAdd that accepts matrix inputs a and b.

function [a,oldA] = myAtomicAdd(a,b)

oldA = coder.nullcopy(a);

for i = 1:size(a,1) for j = 1:size(a,2) for k = 1:size(a,3) [a(i,j,k),oldA(i,j,k)] = gpucoder.atomicAdd(a(i,j,k),b(i,j,k)); end end end

To create a type for a matrix of doubles for use in code generation, use thecoder.newtype function.

A = coder.newtype('single', [30 30 20], [1 0 1]); B = coder.newtype('single', [30 30 20], [1 0 1]); inputArgs = {A,B};

To generate a CUDA library, use the codegen function.

cfg = coder.gpuConfig('lib'); cfg.GenerateReport = true;

codegen -config cfg -args inputArgs myAtomicAdd -d myAtomicAdd

The generated CUDA code contains the myAtomicAdd_kernel1 kernel with calls to the atomicAdd() CUDA APIs.

// // File: myAtomicAdd.cu // ... static global launch_bounds(1024, 1) void myAtomicAdd_kernel1( const float b_data[], const int b_size[3], int a_size[3], const int oldA_size[3], const int b_a_size, const int i, float oldA_data[], float a_data[]) { unsigned long loopEnd; unsigned long threadId; ...

for (unsigned long idx{threadId}; idx <= loopEnd; idx += threadStride) { unsigned long tmpIndex; int b_i; int j; int k; k = static_cast(idx % (static_cast(b_a_size) + 1UL)); tmpIndex = (idx - static_cast(k)) / (static_cast(b_a_size) + 1UL); j = static_cast(tmpIndex % 30UL); tmpIndex = (tmpIndex - static_cast(j)) / 30UL; b_i = static_cast(tmpIndex); oldA_data[(b_i + oldA_size[0] * j) + oldA_size[0] * 30 * k] = atomicAdd(&a_data[(b_i + a_size[0] * j) + a_size[0] * 30 * k], b_data[(b_i + b_size[0] * j) + b_size[0] * 30 * k]); } } ...

void myAtomicAdd(float a_data[], int a_size[3], const float b_data[], const int b_size[3], float oldA_data[], int oldA_size[3]) { dim3 block; dim3 grid; ...

cudaMemcpy(gpu_a_data, a_data, a_size[0] * (30 * a_size[2]) * sizeof(float),
           cudaMemcpyHostToDevice);
myAtomicAdd_kernel1<<<grid, block>>>(gpu_b_data, *gpu_b_size, *gpu_a_size,
                                     *gpu_oldA_size, b_a_size, i,
                                     gpu_oldA_data, gpu_a_data);
oldA_data_dirtyOnGpu = true;
cudaMemcpy(a_data, gpu_a_data, a_size[0] * (30 * a_size[2]) * sizeof(float),
           cudaMemcpyDeviceToHost);

} ...

}

Input Arguments

collapse all

Operands, specified as scalars, vectors, matrices, or multidimensional arrays. Inputs A and B must satisfy the following requirements:

Data Types: double | single | int32 | uint32 | uint64

Version History

Introduced in R2021b

See Also

Functions

Topics