[Distutils] changelog / CDN inconsistency (was: Re: Good news everyone, PyPI is behind a CDN)

holger krekel holger at merlinux.eu
Mon May 27 14:08:46 CEST 2013


Hi Noah, Donald, (CC also Richard, Christian),

i just checked with a test package and think we might have a cache
consistency / changelog API problem.  It took me a while but here is 
the basic thing: I uploaded a test package, changelog API reports it has
changed, then i go to its simple page, and some of the time the new release
file shows up, sometimes not.

Tools like bandersnatch, pep381 and devpi-server (and probably others)
use PyPI's changelog API to determine if there are changes.  It seems
those changes are signalled faster than they become consistently accessible 
through the CDN.  This can lead to inconsistent mirrors because when 
the CDN has the files there is no change event anymore.  Such mirrors 
are run by companies in-house so i think it's a real problem.

Even without mirroring there can be problems because installs are not
directly repeatable: "pip install XYZ>=2.0" can give you first 2.0.1,
then 2.0.0 a minute later.  I had hoped that a particular ip address
sees things consistently.

I am not familiar with Fastly's caching properties -- can they notify
about the fact that a page/file is consistently up-to-date everywhere?  
Or can the cache be globally invalidated for a particular page/file?
Any other ideas?

Failing customizing Fastly usage and also maybe for the short term,
is/could there be a special location provided by pypi.python.org which
the above tools could use to get at the actual non-cached data?  We
could then maybe mitigate the problem through updates of the respective tools.
That would at least solve the problem for one of my customers i think.

best,
holger


On Sun, May 26, 2013 at 10:34 -0700, Noah Kantrowitz wrote:
> </farnsworth>
> 
> but seriously, at long last today it was my honor to throw the DNS switch to move PyPI to the Fastly caching CDN. I would like to thank Donald Stufft for doing much of the heavy lifting on the PyPI side, and to Fastly for graciously offering to host us. What does this mean for everyone? Well the biggest change is PyPI should get a whole lot faster. There are two major downsides however. There will now be a delay of several minutes in some cases between updating a package and having it be installable, and download counts will now be even more incorrect than they were before. The PyPI admins are discussing what to do about download counts long-term, but for now we all feel that the performance and availability benefits outweigh the loss. If anyone has any questions, or hears anything about issues with PyPI please don't hesitate to contact me.
> 
> --Noah
> 



> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig



More information about the Distutils-SIG mailing list