Caching: Access a local file, but ensure it is up-to-date from a remote URL

Mon Oct 13 02:36:47 EDT 2014

Howdy all,

I'm hoping that the problem I currently have is one already solved,
either in the Python standard library, or with some well-tested obvious
code.

A program I'm working on needs to access a set of files locally; they're
just normal files.

But those files are local cached copies of documents available at remote
URLs — each file has a canonical URL for that file's content.

I'd like an API for ‘get_file_from_cache’ that looks something like::

    file_urls = {
            "foo.txt": "http://example.org/spam/",
            "bar.data": "https://example.net/beans/flonk.xml",
            }

    for (filename, url) in file_urls.items():
        infile = get_file_from_cache(filename, canonical=url)
        do_stuff_with(infile.read())

* If the local file's modification timestamp is not significantly
  earlier than the Last-Modified timestamp for the document at the
  corresponding URL, ‘get_file_from_cache’ just returns the file object
  without changing the file.

* The local file might be out of date (its modification timestamp may be
  significantly older than the Last-Modified timestamp from the
  corresponding URL). In that case, ‘get_file_from_cache’ should first
  read the document's contents into the file, then return the file
  object.

* The local file may not yet exist. In that case, ‘get_file_from_cache’
  should first read the document content from the corresponding URL,
  create the local file, and then return the file object.

* The remote URL may not be available for some reason. In that case,
  ‘get_file_from_cache’ should simply return the file object, or if that
  can't be done, raise an error.

So this is something similar to an HTTP object cache. Except where those
are usually URL-focussed with the local files a hidden implementation
detail, I want an API that focusses on the local files, with the remote
requests a hidden implementation detail.

Does anything like this exist in the Python library, or as simple code
using it? With or without the specifics of HTTP and URLs, is there some
generic caching recipe already implemented with the standard library?

This local file cache (ignoring the spcifics of URLs and network access)
seems like exactly the kind of thing that is easy to get wrong in
countless ways, and so should have a single obvious implementation
available.

Am I in luck? What do you advise?

-- 
 \       “If consumers even know there's a DRM, what it is, and how it |
  `\     works, we've already failed.” —Peter Lee, Disney corporation, |
_o__)                                                             2005 |
Ben Finney