[Python-Dev] Store startup modules as C structures for 20%+ startup speed improvement? (original) (raw)

Paul Moore p.f.moore at gmail.com
Sat Sep 15 05:53:20 EDT 2018


On Fri, 14 Sep 2018 at 23:28, Neil Schemenauer <nas-python at arctrix.com> wrote:

On 2018-09-14, Larry Hastings wrote: > [..] adding the stat calls back in costs you half the startup. So > any mechanism where we're talking to the disk at all simply > isn't going to be as fast. Okay, so if we use hundreds of small .pyc files scattered all over the disk, that's bad? Who would have thunk it. ;-P We could have a new format, .pya (compiled python archive) that has data for many .pyc files in it. In normal runs you would have one or just and handlful of these things (e.g. one for stdlib, one for your app and all the packages it uses). Then you mmap these just once and rely on OS page faults to bring in the data as you need it. The .pya would have a hash table at the start or end that tells you the offset for each module.

Isn't that essentially what putting the stdlib in a zipfile does? (See the windows embedded distribution for an example). It probably uses normal IO rather than mmap, but maybe adding a "use mmap" flag to the zipfile module would be a more general enhancement that zipimport could use for free.

Paul



More information about the Python-Dev mailing list