Running subinterpreters in multiple threads succeeds in Python 3.12 but not 3.13 (original) (raw)

I have the following C++ code that launches 4 threads. Each thread calls the Py_NewInterpreterFromConfig and runs a python script named gettid.py:

#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <string>
#include <filesystem>
#include "unistd.h"
#include "Python.h"

void execute_python_code(
  PyInterpreterConfig config, const std::string& module_name, const std::string& func_name, std::mutex& print_mutex) {
  const auto currentDir = std::filesystem::current_path();
  // PyGILState_STATE gstate;
  // interpreter 1
  PyThreadState *tstate1 = NULL;
  // gstate = PyGILState_Ensure();
  PyStatus status1 = Py_NewInterpreterFromConfig(&tstate1, &config);
  if (PyStatus_Exception(status1)) {
    Py_ExitStatusException(status1);
    std::cout << "Failed\n";
    return;
  }
  PyThreadState_Swap(tstate1);
  PyObject *sysPath = PySys_GetObject("path");
  PyList_Append(sysPath, PyUnicode_FromString(currentDir.c_str()));
  PyObject* myModule = PyImport_ImportModule(module_name.c_str());
  PyErr_Print();
  PyObject* myFunction = PyObject_GetAttrString(myModule, func_name.c_str());
  PyObject* res = PyObject_CallObject(myFunction, NULL);
  // PyGILState_Release(gstate);
  if (res) {
    const int x = (int)PyLong_AsLong(res);
    {
      std::lock_guard<std::mutex> guard(print_mutex);
      std::cerr << "C++ thread id = " << gettid() << ", python result = " << x << std::endl;
    }
  }
  // destroy interpreter 1
  PyThreadState_Swap(tstate1);
  Py_EndInterpreter(tstate1);
}

int main(int argc, char* argv[]) {
  std::mutex print_mutex;
  std::string pymodule;
  std::string pyfuncname;
  if (argc == 3) {
    pyfuncname = std::string(argv[2]);
    argc--;
  } else {
    pyfuncname = "gettid";
  }
  if (argc == 2) {
    pymodule = std::string(argv[1]);
    argc--;
  } else {
    pymodule = "gettid";
  }
  Py_InitializeEx(0);
  PyInterpreterConfig config = {
    .use_main_obmalloc = 0,
    .allow_fork = 1,
    .allow_exec = 1,
    .allow_threads = 1,
    .allow_daemon_threads = 0,
    .check_multi_interp_extensions = 1,
    .gil = PyInterpreterConfig_OWN_GIL,
  };
  const int num_threads = 4;
  std::vector<std::thread> threads;
  for (int i = 0; i < num_threads; ++i) {
    threads.emplace_back(
      execute_python_code, config, pymodule, pyfuncname, std::ref(print_mutex));
  }
  for (int i = 0; i < num_threads; ++i) {
    threads[i].join();
  }
  int status_exit = Py_FinalizeEx();
  std::cout << "status_exit: " << status_exit << std::endl;
  return 0;
}

The python script gettid.py simply prints the thread ID:

#!/usr/bin/env python3
import threading
def gettid():
    return threading.get_native_id()

With python 3.12, I can compile the code and run it successfully:

C++ thread id = 37174, python result = 37174
C++ thread id = 37175, python result = 37175
C++ thread id = 37177, python result = 37177
C++ thread id = 37176, python result = 37176
status_exit: 0

With python 3.13, the C++ code seems stuck at Py_NewInterpreterFromConfig. Any ideas?

da-woods (Da Woods) May 7, 2025, 4:57am 2

If nothing else, the documentation for Py_NewInterpreterFromConfig says that it must be called with the GIL.

HanatoK (HanatoK) May 7, 2025, 4:05pm 3

Thanks! If so, what is the correct way to run different sub-interpreters in different threads? I have tried calling Py_NewInterpreterFromConfig from the main thread, and then calling PyThreadState_Swap in new threads, but PyThreadState_Swap seems still deadlocking.

da-woods (Da Woods) May 7, 2025, 4:39pm 4

Mysteriously, if you look at the two places in the Python code-base where it’s used, they actually don’t look to hold the GIL (because they’ve just swapped it out for NULL).

Which suggests that you can’t trust the documentation (or me) on this - sorry!

I suspect loosely copying their use is the way to go. But not sure I’m a very reliable source on this.

CAM-Gerlach (C.A.M. Gerlach) May 8, 2025, 9:46am 5

As the world’s foremost expert on subinterpreters, perhaps @eric.snow might be able to weigh in on this.

HanatoK (HanatoK) May 9, 2025, 4:01pm 6

I applied the following patch that seems working in both 3.12 and 3.13 (with or without free-threading), but still don’t know if this is the correct solution:

--- main2_old.cpp       2025-05-09 10:56:24.533657095 -0500
+++ main2.cpp   2025-05-09 10:55:48.327840691 -0500
@@ -68,6 +68,7 @@
   };
   const int num_threads = 4;
   std::vector<std::thread> threads;
+  auto* save_state = PyThreadState_Swap(NULL);
   for (int i = 0; i < num_threads; ++i) {
     threads.emplace_back(
       execute_python_code, config, pymodule, pyfuncname, std::ref(print_mutex));
@@ -75,6 +76,7 @@
   for (int i = 0; i < num_threads; ++i) {
     threads[i].join();
   }
+  PyThreadState_Swap(save_state);
   int status_exit = Py_FinalizeEx();
   std::cout << "status_exit: " << status_exit << std::endl;
   return 0;