[Catalog-sig] b.pypi up-to-date again

"Martin v. Löwis" martin at v.loewis.de
Thu Jul 14 21:17:05 CEST 2011


I fixed the AppEngine PyPI mirror (b.pypi.python.org), and it's
up-to-date again.

In case you are interested, here is the list of problems:

- integration of download stats broke on 2010-01-17.
  The implementation creates a Download record every time somebody
  downloads a file from the mirror, and integrates them once a day
  into a daily report (deleting the Download objects, and creating
  a Stats object). It combines all Download records except for the
  ones of the current day, and deletes the one it combines (in chunks
  of 100, since integrating all at once would exceed the 30s limit
  in appengine).
  Now, for Jan 17, integration completed (i.e. GAE said there weren't
  any further Download objects left); however, the next day, another
  Download object for the day showed up, breaking the integration
  job. I fixed it by post-dating it to the next day that hadn't been
  integrated.

- mirroring broke when a file was deleted on PyPI after the mirror
  learned of its existence, but before it got mirrored. Mirroring
  works the way that
  a) the GAE app asks PyPI to upload something (downloading from the
     master doesn't work because it exceeds the maximum response
     size for the url fetcher).
  b) PyPI asks GAE to create an upload URL, and asks what file
    to upload
  c) PyPI uploads the file to the blob store
  d) GAE tells the GAE app that the upload is complete
  In the scenario, step c) failed, because the file wasn't there
  anymore, so PyPI just ignored it. The GAE app's cron job then
  restarted the upload, which would fail again - and so every
  5 minutes since March 2010.
  I fixed it by having PyPI upload a null-byte file, along
  with a POST flag telling that the file is to be deleted

- restarting the mirroring then failed since accessing the
  changelog since March took about 20s, exceeding the 5s XML-RPC
  limit of GAE. I fixed it by introducing another RPC,
  changed_packages.

It then took three days to catch up, but should now stay up-to-date.

Regards,
Martin


More information about the Catalog-SIG mailing list