[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement? (original) (raw)

Fabio Zadrozny fabiofz at gmail.com
Tue Sep 18 14:46:11 EDT 2018


On Tue, Sep 18, 2018 at 2:57 PM, Carl Shapiro <carl.shapiro at gmail.com> wrote:

On Tue, Sep 18, 2018 at 5:55 AM, Fabio Zadrozny <fabiofz at gmail.com> wrote:

During the import process, Python can already deal with folders and .zip files in sys.path... now, instead of having special handling for a new concept with a custom command line, etc, why not just say that this is a special file (e.g.: files with a .pyfrozen extension) and make importlib be able to deal with it when it's on sys.path (that way there could be multiple of those and there should be no need to turn it on/off, custom command line, etc)?

That is an interesting idea but it might not be easy to work into this design. The improvement in start-up time comes from eliminating the overheads of filesystem I/O, memory allocation, and un-marshaling bytecode. Having this data on the filesystem would reintroduce the cost of filesystem I/O and it would add a load-time relocation to the equation so the overall performance benefits would be greatly lessened. Another question: doesn't importlib already provide hooks for external contributors which could address that use case? (so, this could initially be available as a third party library for maturing outside of CPython and then when it's deemed to be mature it could be integrated into CPython -- not that this can't happen on Python 3.8 timeframe, but it'd be useful checking its use against the current Python version and measuring benefits with real world code). This may be possible but, for the same reasons I outline above, it would certainly come at the expense of performance. I think many people are interested in a better .pyc format but our goals are much more modest. We are actually trying to not introduce a whole new way to externalize .py data in CPython. Rather, we think of this as just making the existing frozen module capability much faster so its use can be broadened to making start-up performance better. The user visible part, the command line interface to bypass the frozen module, would be a nice-to-have for developers but is something we could live without.

Just to make sure we're in the same page, the approach I'm talking about would still be having a dll, not a better .pyc format, so, during the import a custom importer would open that dll once and provide modules from it -- do you think this would be much more overhead than what's proposed now?

I guess it may be a bit slower because it'd have to obey the existing import capabilities, but that shouldn't mean more time is spent on IO, memory allocation nor un-marshaling bytecode (although it may be that I misunderstood the approach or the current import capabilities don't provide the proper api for that). -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20180918/09a83c08/attachment.html>



More information about the Python-Dev mailing list