Caching: Access a local file, but ensure it is up-to-date from a remote URL

Mon Oct 13 03:51:24 EDT 2014

On Mon, Oct 13, 2014 at 5:36 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> So this is something similar to an HTTP object cache. Except where those
> are usually URL-focussed with the local files a hidden implementation
> detail, I want an API that focusses on the local files, with the remote
> requests a hidden implementation detail.

Potential issue: You may need some metadata storage as well as the
actual files. Or can you just ignore the Varies header etc etc etc,
and pretend that this URL represents a single blob of data no matter
what? I'm also dubious about relying on FS timestamps for critical
data, as it's very easy to bump the timestamp to current, which would
make your program think that the contents are fresh; but if that's
truly the only metadata needed, that might be safe to accept.

One way you could possibly do this is to pick up a URL-based cache
(even something stand-alone like Squid), and then create symlinks from
your canonically-named local files to the implementation-detail
storage space for the cache. Then you probe the URL and return its
contents. That guarantees that you're playing nicely with the rules of
HTTP (particularly if you have chained proxies, proxy authentication,
etc, etc - if you're deploying this to arbitrary locations, that might
be an issue), but at the expense of complexity.

ChrisA