[Python-Dev] Impact of Namedtuple on startup time (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Mon Jul 17 08:43:19 EDT 2017
- Previous message (by thread): [Python-Dev] [RELEASE] Python 3.6.2 is now available
- Next message (by thread): [Python-Dev] Impact of Namedtuple on startup time
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
Cost of creating a namedtuple has been identified as a contributor to Python startup time. Not only Python core and the stdlib, but any third-party library creating namedtuple classes (there are many of them). An issue was created for this: https://bugs.python.org/issue28638
Raymond decided to close the issue because:
the proposed resolution makes the "_source" attribute empty (or, at least, something else than it currently is). Raymond claims the "_source" attribute is an essential feature of namedtuples.
optimizing startup cost is supposedly not worth the effort.
To this, I will counter-argument:
As for 1), a search for "namedtuple" and "_source" in a code search engine (*) brings only false positives of different kinds:
- clones of the CPython repo
- copies of the namedtuple class instantiation source code with slight tweaks (not reading the _source attribute of an existing namedtuple)
- modules using namedtuples and also using a "_source" attribute on unrelated objects
(*) https://searchcode.com/?q=namedtuple+_source
As for 2), startup time is actually a very important consideration nowadays, both for small scripts and for interactive use with the now very wide-spread use of Jupyter Notebooks. A 1 ms. cost when importing a single module can translate into a large slowdown when your library imports (directly or indirectly) hundreds of modules, many of which may create their own namedtuple classes.
Nick pointed out that one alternative is to make the C-written "struct sequence" class user-visible.
My opinion is that, while better than nothing, this would complicate things by exposing two very similar primitives in the stdlib, without there being a clear choice for users. Should I use the well-known namedtuple? Should I use the new-ish "struct sequence", with similar characteristics and better performance, but worse compatibility (now I have to write fallback code for Python versions where the "struct sequence" isn't exposed)?
And not to mention all third-party libraries must be migrated to the newly-exposed "struct sequence" + compatibility fallback code...
So my take is:
Usage of "_source" in open source code (as per the search above) seems non-existent.
If the primary intent of "_source" is to show-case how to write a tuple subclass, well, why not write a recipe or tutorial somewhere? The Python stdlib is generally not a place where we reify tutorials or educational snippets as public APIs.
The well-known namedtuple would really benefit from a performance boost, without asking all maintainers of dependent code (that's a ton) to migrate to a new idiom + compatibility fallback.
Regards
Antoine.
- Previous message (by thread): [Python-Dev] [RELEASE] Python 3.6.2 is now available
- Next message (by thread): [Python-Dev] Impact of Namedtuple on startup time
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]