[Catalog-sig] Deprecate External Links

Monty Taylor mordred at inaugust.com
Wed Feb 27 19:43:09 CET 2013



On 02/27/2013 01:32 PM, Giovanni Bajo wrote:
> Il giorno 27/feb/2013, alle ore 19:23, Donald Stufft
> <donald.stufft at gmail.com <mailto:donald.stufft at gmail.com>> ha scritto:
> 
>> On Wednesday, February 27, 2013 at 12:44 PM, Donald Stufft wrote:
>>>>
>>>> Why not first have an a good infrastructure and capacity with
>>>> pypi.python.org <http://pypi.python.org/> so that people *want* to
>>>> move their files there?
>>> PyPI has had very good uptime since the move to OSL. I don't have
>>> numbers handy but I believe I can get them.
>> I got the numbers! Since almost a year ago (This was setup at the last
>> US PyCon):
>>
>> Uptime: 99.99%
>> Downtime: 6h 58m
>> Number of Downtimes: 126
>>
>> I want to stress again that even if that was a poor number that adding
>> more points of failure only decrease the expected uptime, or at best
>> does nothing.
> 
> In fact, adding a caching CDN in front of PyPI (instead of the current
> mirror protocol) would probably bring the uptime close to 100% for
> people downloading packages via pip.
> 
> I'm +1 on dropping the current (complicated) mirror system and external
> links, and in favor of centralizing everything into PyPI, plus a
> third-party CDN / hosting service. In fact, Python is a big-enough brand
> name that we could even get a CDN service almost for free in exchange of
> an acknowledge of the CDN company being used.

I can absolutely promise pypi access to CDN if we were to drop external
links. I will also promise as much storage space as is needed to host
every single python package in existence from at least two giant public
clouds. Today. Right now. Gimme a day I betcha I can get you space on 3
more clouds. It seems other people have made similar offers. Space and
bandwidth for this are not a problem at all.

To double-down on points already made:

I run a very large build system for a project that does automatic
testing and gating of every commit. Our project is all in python, and it
has a bunch of developers working on it who are new to python (their
employer has told them they're hacking in python now, so they are)

Some things I have learned from this:

a) NOBODY knows that PyPI hosts files on external links, and almost to a
person they are not pleased about that when I tell them. It goes
something like this:

  me: You know, pip install foo doesn't actually even download foo from
pypi, it downloads it from sourceforge.
  them: WHAT? That's ridiculous

b) Because of the above, people's expectations are subverted. It may
come as a shock, but "It's on PyPI" is good enough for a lot of people.
Except - it's not on PyPI, they just don't know it.

c) Since external links are external, it means it's not just the package
itself, but the list of available package versions that have to hit the
external link. This makes it exceptionally slow to do things. Think
you've helped things by mirroring /simple ? NOPE. Your locally mirror is
going to get bounced over and you're still going to hit sourceforge,
where screen-scraping of HTML is going to happen - all just to figure
out that you do, in fact, have the latest version of the package.

I know there are good historical reasons for the design ... but we are
long past their usefulness. If there is anything I can do to help kill
external links, please let me know.


More information about the Catalog-SIG mailing list