Defining extension modules (original) (raw)

A C extension for CPython is a shared library (for example, a .so file on Linux, .pyd DLL on Windows), which is loadable into the Python process (for example, it is compiled with compatible compiler settings), and which exports an initialization function.

To be importable by default (that is, byimportlib.machinery.ExtensionFileLoader), the shared library must be available on sys.path, and must be named after the module name plus an extension listed inimportlib.machinery.EXTENSION_SUFFIXES.

Normally, the initialization function returns a module definition initialized using PyModuleDef_Init(). This allows splitting the creation process into several phases:

This is called multi-phase initialization to distinguish it from the legacy (but still supported) single-phase initialization scheme, where the initialization function returns a fully constructed module. See the single-phase-initialization section belowfor details.

Changed in version 3.5: Added support for multi-phase initialization (PEP 489).

Multiple module instances

By default, extension modules are not singletons. For example, if the sys.modules entry is removed and the module is re-imported, a new module object is created, and typically populated with fresh method and type objects. The old module is subject to normal garbage collection. This mirrors the behavior of pure-Python modules.

Additional module instances may be created insub-interpretersor after Python runtime reinitialization (Py_Finalize() and Py_Initialize()). In these cases, sharing Python objects between module instances would likely cause crashes or undefined behavior.

To avoid such issues, each instance of an extension module should be isolated: changes to one instance should not implicitly affect the others, and all state owned by the module, including references to Python objects, should be specific to a particular module instance. See Isolating Extension Modules for more details and a practical guide.

A simpler way to avoid these issues israising an error on repeated initialization.

All modules are expected to supportsub-interpreters, or otherwise explicitly signal a lack of support. This is usually achieved by isolation or blocking repeated initialization, as above. A module may also be limited to the main interpreter using the Py_mod_multiple_interpreters slot.

Initialization function

The initialization function defined by an extension module has the following signature:

PyObject *PyInit_modulename(void)

Its name should be PyInit_ _<name>_, with <name> replaced by the name of the module.

For modules with ASCII-only names, the function must instead be namedPyInit_ _<name>_, with <name> replaced by the name of the module. When using Multi-phase initialization, non-ASCII module names are allowed. In this case, the initialization function name isPyInitU_ _<name>_, with <name> encoded using Python’s_punycode_ encoding with hyphens replaced by underscores. In Python:

def initfunc_name(name): try: suffix = b'' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U' + name.encode('punycode').replace(b'-', b'_') return b'PyInit' + suffix

It is recommended to define the initialization function using a helper macro:

PyMODINIT_FUNC

Declare an extension module initialization function. This macro:

For example, a module called spam would be defined like this:

static struct PyModuleDef spam_module = { .m_base = PyModuleDef_HEAD_INIT, .m_name = "spam", ... };

PyMODINIT_FUNC PyInit_spam(void) { return PyModuleDef_Init(&spam_module); }

It is possible to export multiple modules from a single shared library by defining multiple initialization functions. However, importing them requires using symbolic links or a custom importer, because by default only the function corresponding to the filename is found. See the Multiple modules in one librarysection in PEP 489 for details.

The initialization function is typically the only non-staticitem defined in the module’s C source.

Multi-phase initialization

Normally, the initialization function(PyInit_modulename) returns a PyModuleDef instance with non-NULL m_slots. Before it is returned, the PyModuleDef instance must be initialized using the following function:

PyObject *PyModuleDef_Init(PyModuleDef *def)

Part of the Stable ABI since version 3.5.

Ensure a module definition is a properly initialized Python object that correctly reports its type and a reference count.

Return def cast to PyObject*, or NULL if an error occurred.

Calling this function is required for Multi-phase initialization. It should not be used in other contexts.

Note that Python assumes that PyModuleDef structures are statically allocated. This function may return either a new reference or a borrowed one; this reference must not be released.

Added in version 3.5.

Legacy single-phase initialization

Attention

Single-phase initialization is a legacy mechanism to initialize extension modules, with known drawbacks and design flaws. Extension module authors are encouraged to use multi-phase initialization instead.

In single-phase initialization, theinitialization function (PyInit_modulename) should create, populate and return a module object. This is typically done using PyModule_Create() and functions likePyModule_AddObjectRef().

Single-phase initialization differs from the defaultin the following ways: