Using Parallelism — Cython 3.1.0 documentation (original) (raw)

Note

This page uses two different syntax variants:

Cython supports native parallelism through the cython.parallelmodule. To use this kind of parallelism, the GIL must be released (see Releasing the GIL). It currently supports OpenMP, but later on more backends might be supported.

Note

Functionality in this module may only be used from the main thread or parallel regions due to OpenMP restrictions.

cython.parallel.prange([start,] stop[, step][, nogil=False][, use_threads_if=CONDITION][, schedule=None[, chunksize=None]][, num_threads=None])

This function can be used for parallel loops. OpenMP automatically starts a thread pool and distributes the work according to the schedule used.

Thread-locality and reductions are automatically inferred for variables.

If you assign to a variable in a prange block, it becomes lastprivate, meaning that the variable will contain the value from the last iteration. If you use an inplace operator on a variable, it becomes a reduction, meaning that the values from the thread-local copies of the variable will be reduced with the operator and assigned to the original variable after the loop. The index variable is always lastprivate. Variables assigned to in a parallel with block will be private and unusable after the block, as there is no concept of a sequentially last value.

Parameters:

Example with a reduction:

from cython.parallel import prange

i = cython.declare(cython.int) n = cython.declare(cython.int, 30) sum = cython.declare(cython.int, 0)

for i in prange(n, nogil=True): sum += i

print(sum)

Example with a typed memoryview (e.g. a NumPy array)

from cython.parallel import prange

def func(x: cython.double[:], alpha: cython.double): i: cython.Py_ssize_t

for i in prange(x.shape[0], nogil=True):
    x[i] = alpha * x[i]

Example with conditional parallelism:

from cython.parallel import prange

def psum(n: cython.int):

i: cython.int
sum: cython.int = 0

for i in prange(n, nogil=True, use_threads_if=n>1000):
    sum += i

return sum

psum(30) # Executed sequentially psum(10000) # Executed in parallel

cython.parallel.parallel(num_threads=None, use_threads_if=CONDITION)

This directive can be used as part of a with statement to execute code sequences in parallel. This is currently useful to setup thread-local buffers used by a prange. A contained prange will be a worksharing loop that is not parallel, so any variable assigned to in the parallel section is also private to the prange. Variables that are private in the parallel block are unavailable after the parallel block.

Example with thread-local buffers

from cython.parallel import parallel, prange from cython.cimports.libc.stdlib import abort, malloc, free

@cython.nogil @cython.cfunc def func(buf: cython.p_int) -> cython.void: pass # ...

idx = cython.declare(cython.Py_ssize_t) i = cython.declare(cython.Py_ssize_t) j = cython.declare(cython.Py_ssize_t) n = cython.declare(cython.Py_ssize_t, 100) local_buf = cython.declare(p_int) size = cython.declare(cython.size_t, 10)

with cython.nogil, parallel(): local_buf: cython.p_int = cython.cast(cython.p_int, malloc(cython.sizeof(cython.int) * size)) if local_buf is cython.NULL: abort()

# populate our local buffer in a sequential loop
for i in range(size):
    local_buf[i] = i * 2

# share the work using the thread-local buffer(s)
for j in prange(n, schedule='guided'):
    func(local_buf)

free(local_buf)

Later on sections might be supported in parallel blocks, to distribute code sections of work among threads.

cython.parallel.threadid()

Returns the id of the thread. For n threads, the ids will range from 0 to n-1.

Compiling

To actually use the OpenMP support, you need to tell the C or C++ compiler to enable OpenMP. For gcc this can be done as follows in a setup.py:

from setuptools import Extension, setup from Cython.Build import cythonize

ext_modules = [ Extension( "hello", ["hello.py"], extra_compile_args=['-fopenmp'], extra_link_args=['-fopenmp'], ) ]

setup( name='hello-parallel-world', ext_modules=cythonize(ext_modules), )

For the Microsoft Visual C++ compiler, use '/openmp' instead of '-fopenmp' for the 'extra_compile_args' option. Don’t add any OpenMP flags to the 'extra_link_args' option.

Breaking out of loops

The parallel with and prange blocks support the statements break, continue and return in nogil mode. Additionally, it is valid to use a with gil block inside these blocks, and have exceptions propagate from them. However, because the blocks use OpenMP, they can not just be left, so the exiting procedure is best-effort. For prange() this means that the loop body is skipped after the first break, return or exception for any subsequent iteration in any thread. It is undefined which value shall be returned if multiple different values may be returned, as the iterations are in no particular order:

from cython.parallel import prange

@cython.exceptval(-1) @cython.cfunc def func(n: cython.Py_ssize_t) -> cython.int: i: cython.Py_ssize_t

for i in prange(n, nogil=True):
    if i == 8:
        with cython.gil:
            raise Exception()
    elif i == 4:
        break
    elif i == 2:
        return i

In the example above it is undefined whether an exception shall be raised, whether it will simply break or whether it will return 2.

Using OpenMP Functions

OpenMP functions can be used by cimporting openmp:

from cython.parallel import parallel from cython.cimports.openmp import omp_set_dynamic, omp_get_num_threads

num_threads = cython.declare(cython.int)

omp_set_dynamic(1) with cython.nogil, parallel(): num_threads = omp_get_num_threads() # ...

References