[Catalog-sig] PyPI mirrors are all up to date

martin at v.loewis.de martin at v.loewis.de
Tue Apr 17 11:57:11 CEST 2012


>>> if you calculate a checksum with all mirrored files - you can guarantee
>>> that the bits are the same
>>> on both side, no ?
>> How exactly would you calculate that checksum?
> by calculating the grand hash of each file hash.

In this case, the checksum would not be a reliable indication that the
files are actually up-to-date. For example, a mirror may keep updating
files into the wrong location (not the location that is then used to
serve the files), so that the files being served are from a stale copy.
This is not theoretical - it actually happened in my mirror setup at one
time.

>> That could take a few hours per change.
> why that ? you don't calculate the checksum of a file your already  
> have twice.
>
> Even if you do, it's very fast to call md5.
>
> try it:
>
> $ find mirror | xargs md5
>
> this takes a few seconds at most on the whole mirror

I tried it, and on my mirror, it took 27 minutes and 7 seconds.
So not exactly hours, but not "a few seconds" either.

Regards,
Martin




More information about the Catalog-SIG mailing list