[Python-Dev] Planning on removing cache invalidation for file finders (original) (raw)

PJ Eby pje at telecommunity.com
Mon Mar 4 21:52:14 CET 2013


On Sun, Mar 3, 2013 at 12:31 PM, Brett Cannon <brett at python.org> wrote:

But how about this as a compromise over introducing writemodule(): invalidatecaches() can take a path for something to specifically invalidate. The path can then be passed to the invalidatecaches() on sys.metapath. In the case of PathFinder it would take that path, try to find the directory in sys.pathimportercache, and then invalidate the most specific finder for that path (if there is one that has any directory prefix match).

Lots of little details to specify (e.g. absolute path forced anywhere in case a relative path is passed in by sys.path is all absolute paths? How do we know something is a file if it has not been written yet?), but this would prevent importlib from subsuming file writing specifically for source files and minimize performance overhead of invalidating all caches for a single file.

ISTR that when we were first discussing caching, I'd proposed a TTL-based workaround for the timestamp granularity problem, and it was mooted because somebody already proposed and implemented a similar idea. But my approach -- or at least the one I have in mind now -- would provide an "eventual consistency" guarantee, while still allowing fast startup times.

However I think the experience with this heuristic so far shows that the real problem isn't that the heuristic doesn't work for the normal case; it works fine for that. Instead, what happens is that it doesn't work when you generate modules.

And that problem can be fixed without even invalidating the caches: it can be fixed by doing some extra work when writing a module - e.g. by making sure the directory mtime changes again after the module is written.

For example, create the module under a temporary name, verify that the directory mtime is different than it was before, then keep renaming it to different temporary names until the mtime changes again, then rename it to the final name. (This would be very fast on some platforms, much slower on others, but the OS itself would tell you when it had worked.) A utility routine to "write_module()" or "write_package()" would be easier to find than advice that says to invalidate the cache under thus-and-such conditions, as it would show up in searches for writing modules dynamically or creating modules dynamically, where you could only search for info about the cache if you knew the cache existed.



More information about the Python-Dev mailing list