[Catalog-sig] PyPI mirrors are all up to date

martin at v.loewis.de martin at v.loewis.de
Tue Apr 17 15:45:38 CEST 2012


> So you were updating a directory but serving another directory ?
>
> But then updating the right last-modified page people were seeing ?

It probably would have updated an unpublished last-modified as well.

For a similar issue, in appengine, I had duplicate File objects in the
database, and always served the one that GQL happened to return first.
In that case, both last-modified and the checksum might update correctly,
but still, the wrong file might get served.

> I am not sure why we're having this discussion since it's  
> implementation details, but it's fun :)

I'm still trying to prove my claim that it's not feasible to increase
the trustworthiness of a mirror by computing some kind of checksum.
If the mirror has some systematic or random error, it may well be that
the checksum is as-expected, yet the mirror is inconsistent.

> If there's interest I can write a multiprocess-based script that  
> keeps a md5 database up-to-date

That's besides the point. The question is whether doing so would practically
help to improve the consistency, and I believe the answer is: no. It may help
to increase people's trust (which is a subjective manner), which may be
worthwhile itself, but may also backfire if they download inconsistent files
despite the mirror giving "proof" that it is consistent.

Regards,
Martin




More information about the Catalog-SIG mailing list