Cython Best Practices, Conventions and Knowledge (original) (raw)

This documents tips to develop Cython code in scikit-learn.

Tips for developing with Cython in scikit-learn#

Tips to ease development#

For GCC and for clang

%%cython --compile-args=-fopenmp --link-args=-fopenmp

For Microsoft's compilers

%%cython --compile-args=/openmp --link-args=/openmp

You might want to add this alias to your shell script config.

alias cythonX="cython -X language_level=3 -X boundscheck=False -X wraparound=False -X initializedcheck=False -X nonecheck=False -X cdivision=True"

This generates source.c as if you had recompiled scikit-learn entirely.

cythonX --annotate source.pyx

This generates a HTML report (source.html) for source.c.

cythonX --annotate source.pyx

Tips for performance#

Using OpenMP#

Since scikit-learn can be built without OpenMP, it’s necessary to protect each direct call to OpenMP.

The _openmp_helpers module, available insklearn/utils/_openmp_helpers.pyxprovides protected versions of the OpenMP routines. To use OpenMP routines, they must be cimported from this module and not from the OpenMP library directly:

from sklearn.utils._openmp_helpers cimport omp_get_max_threads max_threads = omp_get_max_threads()

The parallel loop, prange, is already protected by cython and can be used directly from cython.parallel.

Types#

Cython code requires to use explicit types. This is one of the reasons you get a performance boost. In order to avoid code duplication, we have a central place for the most used types insklearn/utils/_typedefs.pyd. Ideally you start by having a look there and cimport types you need, for example

from sklear.utils._typedefs cimport float32, float64