[Python-Dev] Planning on removing cache invalidation for file finders

Mon Mar 4 21:52:14 CET 2013

On Sun, Mar 3, 2013 at 12:31 PM, Brett Cannon <brett at python.org> wrote:
> But how about this as a compromise over introducing write_module():
> invalidate_caches() can take a path for something to specifically
> invalidate. The path can then be passed to the invalidate_caches() on
> sys.meta_path. In the case of PathFinder it would take that path, try to
> find the directory in sys.path_importer_cache, and then invalidate the most
> specific finder for that path (if there is one that has any directory prefix
> match).
>
> Lots of little details to specify (e.g. absolute path forced anywhere in
> case a relative path is passed in by sys.path is all absolute paths? How do
> we know something is a file if it has not been written yet?), but this would
> prevent importlib from subsuming file writing specifically for source files
> and minimize performance overhead of invalidating all caches for a single
> file.

ISTR that when we were first discussing caching, I'd proposed a
TTL-based workaround for the timestamp granularity problem, and it was
mooted because somebody already proposed and implemented a similar
idea.  But my approach -- or at least the one I have in mind now --
would provide an "eventual consistency" guarantee, while still
allowing fast startup times.

However I think the experience with this heuristic so far shows that
the real problem isn't that the heuristic doesn't work for the normal
case; it works fine for that.  Instead, what happens is that *it
doesn't work when you generate modules*.

And *that* problem can be fixed without even invalidating the caches:
it can be fixed by doing some extra work when writing a module - e.g.
by making sure the directory mtime changes again after the module is
written.

For example, create the module under a temporary name, verify that the
directory mtime is different than it was before, then keep renaming it
to different temporary names until the mtime changes again, then
rename it to the final name.  (This would be very fast on some
platforms, much slower on others, but the OS itself would tell you
when it had worked.)  A utility routine to "write_module()" or
"write_package()" would be easier to find than advice that says to
invalidate the cache under thus-and-such conditions, as it would show
up in searches for writing modules dynamically or creating modules
dynamically, where you could only search for info about the cache if
you knew the cache existed.