[Python-Dev] New Python Initialization API (original) (raw)

Steve Dower steve.dower at python.org
Sat Mar 30 20:48:56 EDT 2019


Here is my first review of https://www.python.org/dev/peps/pep-0587/ and in general I think it's very good. There are some things I'd like to consider changing before we embed them permanently in a public API, and as I'm still keen to have the ability to write Python code for configuration (to replace getpath.c, for example) I have a bias towards making that work more smoothly with these changes when we get to it.

I think my biggest point (about halfway down) is that I'd rather use argv/environ/etc. to initialize PyConfig and then only use the config for initializing the runtime. That way it's more transparent for users and more difficult for us to add options that embedders can't access.

The appendix is excellent, by the way. Very useful detail to have written down.

Cheers, Steve

PyWideCharList is a list of wchart* strings.

I always forget whether "const" is valid in C99, but if it is, can we make this a list of const strings?

I also prefer a name like PyWideStringList, since that's what it is (the other places we use WideChar in the C API refer only to a single string, as far as I'm aware).

PyInitError is a structure to store an error message or an exit code for the Python Initialization.

I love this struct! Currently it's private, but I wonder whether it's worth making it public as PyError (or PyErrorInfo)? We obviously can't replace all uses of int as a return value throughout the API, but I think it has uses elsewhere and we may as well protect against having to rename in the future.

* exitcode (int): if greater or equal to zero, argument passed to exit()

Windows is likely to need/want negative exit codes, as many system errors are represented as 0x80000000|(source of error)|(specific code).

* usererr (int): if non-zero, the error is caused by the user configuration, otherwise it's an internal Python error.

Maybe we could just encode this as "positive exitcode is user error, negative is internal error"? I'm pretty sure struct return values are passed by reference in most C calling conventions, so the size of the struct isn't a big deal, but without a proper bool type it may look like this is a second error code (like errno/winerror in a few places).

PyPreConfig structure is used to pre-initialize Python:

* Set the memory allocator * Configure the LCCTYPE locale * Set the UTF-8 mode

I think we should have the isolated flag in here - oh wait, we do - I think we should have the isolated/use_environment options listed in this list :)

Functions to pre-initialize Python:

* PyInitError PyPreInitialize(const PyPreConfig *config) * ``PyInitError PyPreInitializeFromArgs( const PyPreConfig *config, int argc, char *argv)`` _ PyInitError PyPreInitializeFromWideArgs( const PyPreConfig_ *config, int argc, wchar_t **argv)

I hope to one day be able to support multiple runtimes per process - can we have an opaque PyRuntime object exposed publicly now and passed into these functions?

(FWIW, I think we're a long way from being able to support multiple runtimes simultaneously, so the initial implementation here would be to have a PyRuntime_Create() that returns our global one once and then errors until it's finalised. The migration path is probably to enable switching of the current runtime via a dedicated function (i.e. one active at a time, probably with thread local storage), since we have no "context" parameter for C API functions, and obviously there are still complexities such as poorly written extension modules that nonetheless can be dealt with in embedding scenarios by simply not using them. This doesn't seem like an unrealistic future, unless we add a whole lot of new APIs now that can't allow it :) )

PyPreConfig fields:

* coerceclocalewarn: if non-zero, emit a warning if the C locale is coerced. * coerceclocale: if equals to 2, coerce the C locale; if equals to 1, read the LCCTYPE to decide if it should be coerced.

Can we use another value for coerce_c_locale to determine whether to warn or not? Save a field.

* legacywindowsfsencoding (Windows only): if non-zero, set the Python filesystem encoding to "mbcs". * utf8mode: if non-zero, enable the UTF-8 mode

Why not just set the encodings here? The "PreInitializeFromArgs" functions can override it based on the other variables we have, and embedders have a more obvious question to answer than "do I want legacy behaviour in my app".

Obviously we are not ready to import most encodings after pre initialization, but I think that's okay. Embedders who set something outside the range of what can be used without importing encodings will get an error to that effect if we try.

In fact, I'd be totally okay with letting embedders specify their own function pointer here to do encoding/decoding between Unicode and the OS preferred encoding. That would let them use any approach they like. Similarly for otherwise-unprintable messages (On an earlier project when I was embedding Python into Windows Store apps - back when their API was more heavily restricted - this would have been very helpful.)

Example of Python initialization enabling the isolated mode::

PyConfig config = PyConfigINIT; config.isolated = 1;

Haven't we already used extenal values by this point that should have been isolated? I'd rather have the isolation up front. Or better yet, make isolation the default unless you call one of the "FromArgs" functions, and then we don't actually need the config setting at all.

PyConfig fields:

Before I start discussing individual groups of fields, I would really like to see the following significant change here (because it'll help keep us honest and prevent us breaking embedders in the future).

Currently you have three functions, that take a PyConfig and optionally also use the environment/argv to figure out the settings:

* PyInitError PyInitializeFromConfig(const PyConfig *config) * ``PyInitError PyInitializeFromArgs(const PyConfig *config, int argc, char *argv)`` _ PyInitError PyInitializeFromWideArgs(const PyConfig *config, int_ argc, wchar_t **argv)

I would much prefer to see this flipped around, so that there is one initialize function taking PyConfig, and two functions that will fill out the PyConfig based on the environment:

(note two of the "const"s are gone)

This means that callers who want to behave like Python will request an equivalent config and then use it. Those callers who want to be nearly like Python can change things in between. And the precedence rules get simpler because Py_SetConfig* just overwrites anything:

PyConfig config = PyConfig_INIT; Py_SetConfigFromWideArgs(&config, argc, argv); /* optionally change any settings */ Py_InitializeFromConfig(&config);

We could even split out PyMainConfig here and have another function to collect the settings to pass to Py_RunMain() (such as the script or module name, things to print on exit, etc.).

Our python.c then uses this configuration, so it gets a few lines longer than at present. But it becomes a more useful example for people who want a nearly-like-Python version, and also ensures that any new configuration we support will be available to embedders.

So with that said, here's what I think about the fields:

* argv: sys.argv * baseexecprefix: sys.baseexecprefix * baseprefix: sys.baseprefix * execprefix: sys.execprefix * executable: sys.executable * prefix: sys.prefix * xoptions: sys.xoptions

I like all of these, as they nicely map to their sys members.

* modulesearchpathenv: PYTHONPATH environment variale value * modulesearchpaths, usemodulesearchpaths: sys.path

Why not just have "path" to mirror sys.path? If we go to a Py_SetConfig approach then all the logic for inferring the path is in there, and if we don't then the FromArgs() function will either totally override or ignore it. Either way, no embedder needs to set these individually.

* home: Python home * programname: Program name * program: argv[0] or an empty string * usersitedirectory: if non-zero, add user site directory to sys.path

Similarly, these play a role when inferring the regular Python sys.path, but are not something you should need to use when embedding.

* dllpath (Windows only): Windows DLL path

I'd have to look up exactly how this is used, but I don't think it's configurable (certainly not by this point where it's already been loaded).

* filesystemencoding: Filesystem encoding, sys.getfilesystemencoding() * filesystemerrors: Filesystem encoding errors, sys.getfilesystemencodeerrors()

See above. If we need these earlier in initialization (and I agree we probably do), then let's just put them in the pre-initialize config.

* dumprefs: if non-zero, display all objects still alive at exit * inspect: enter interactive mode after executing a script or a command * interactive: interactive mode * mallocstats: if non-zero, dump memory allocation statistics at exit * quiet: quiet mode (ex: don't display the copyright and version messages even in interactive mode) * showalloccount: show allocation counts at exit * showrefcount: show total reference count at exit * siteimport: import the site module at startup? * skipsourcefirstline: skip the first line of the source

These all seem to be flags specific to regular Python and not embedders (providing we have ways for embedders to achieve the same thing through their own API calls, but even if we don't, I'd still rather not codify them as public runtime API). Having them be an optional flag for Py_RunMain() or a PyMainConfig struct would keep them closer to where they belong.

* faulthandler: if non-zero, call faulthandler.enable() * installsignalhandlers: install signal handlers? * tracemalloc: if non-zero, call tracemalloc.start(value)

Probably it's not a good idea to install signal handlers too early, but I think the other two should be able to start any time between pre-init and actual init, no? So make PyFaultHandler_Init() and PyTraceMalloc_Init() public and say "if you want them, call them".

* runcommand: -c COMMAND argument * runfilename: python3 SCRIPT argument * runmodule: python3 -m MODULE argument

I think these should be in a separate configuration struct for Py_RunMain(), probably along with the section above too.

(The other fields all seem fine to me.)

This PEP adds a new PyUnixMain() function which takes command line arguments as bytes::

int PyUnixMain(int argc, char **argv)

I was part of the discussion about this, so I have some more context than what's in the PEP.

Given your next example shows this function would be about six lines long, why do we want to add it? Better to deprecate Py_Main() completely and just copy/paste those six lines from elsewhere (which also helps in the case where you need to do one little thing more and end up having to find those six lines anyway, as in the same example).

Open Questions ==============

* Do we need to add an API for import inittab?

I don't think so. Once embedders have a three-step initialization (pre-init, init, runmain) there are two valid places they can call their own init functions.

* What about the stable ABI? Should we add a version into PyPreConfig and PyConfig structures somehow? The Windows API is known for its ABI stability and it stores the structure size into the structure directly. Do the same?

Yeah, I think so. We already have the Py*_INIT macros, so we could just use a version number that we increment manually, and that will let us optionally ignore added members (or deprecated members that have not been physically removed).

(Though I'd rather make it easier for applications to include a local copy of Python rather than have to safely deal with whatever is installed on the system. But that's a Windows-ism, so making both approaches work is important :) )

* The PEP 432 stores PYTHONCASEOK into the config. Do we need to add something for that into PyConfig? How would it be exposed at the Python level for importlib? Passed as an argument to importlib.bootstrap.setup() maybe? It can be added later if needed.

Could we convert it into an xoption? It's very rarely used, to my knowledge.

* python.pth (Windows only)

I'd like to make this for all platforms, though my understanding was that since Python is not really relocatable on other OS's it isn't so important. And if the rest of this change happens it'll be easier to implement anyway :)



More information about the Python-Dev mailing list