Writing First CUDA Program (original) (raw)

Last Updated : 14 Feb, 2026

A CUDA program is "heterogeneous," meaning it consists of code that runs on two different systems at once: the Host (CPU) and the Device (NVIDIA GPU). CUDA programming model is an extension of the C++ language, adding specialized syntax to manage parallel execution. To coordinate these systems, a standard .cu file follows a specific structural template.

Structure of CUDA Program

For headers, we can add standard C++ headers like <stdio.h> for basic input/output or <math.h> for complex mathematical calculations. For more granular control of the GPU hardware, we use:

#include <cuda.h>

**Explanation: cuda.h header provides access to the CUDA Driver API for low-level device management.

2. Kernel Definition (GPU Code)

The Kernel is a special function designed to run on the GPU. It contains the logic that will be executed in parallel across many threads.

C++ `

global void myKernel() { // This code executes on the GPU printf("Hello from the GPU!\n"); }

`

**Explanation:

3. Main Function (CPU Code)

The main() function is the entry point of the program that runs on the CPU. It handles the logical flow, manages memory and tells the GPU when to start working.

C++ `

int main() { // 1. Launch the kernel on the GPU myKernel<<<1, 1>>>();

// 2. Synchronize to wait for the GPU to finish
cudaDeviceSynchronize();

return 0;

}

`

**Explanation:

This basic "Hello World" example demonstrates the interaction between the CPU and the GPU by launching a single thread to print a message.

C++ `

%%cuda #include <stdio.h>

global void simpleKernel() { printf("Hello world\n"); }

int main() { simpleKernel<<<1, 1>>>();

cudaDeviceSynchronize();

return 0;

}

`

**Output

Hello world

**Explanation: