[Catalog-sig] [PSF-Board] Troubled by changes to PyPI usage agreement

Thu Jan 21 17:49:43 CET 2010

On Thu, Jan 21, 2010 at 5:29 PM, M.-A. Lemburg <mal at egenix.com> wrote:
[..]
>
> Sure, we could do all those things, but such a process will
> cause a lot of admin overhead on part of the PSF.

Which process ? the non-web mirroring requires no effort/work from the PSF.

The only effort that is required is technical, and 70% done at this
point I'd say.

> Using a content delivery system we'd avoid such administration
> work. The PSF would have to sign agreements with 10-20 mirror
> providers, it wouldn't have to setup a monitoring system, keep
> checking the mirror web content, etc.

What is a content delivery system here ? do you mean by that that the PSF
would run the mirror by itself ? if so, how this is going to work technically ?
how would it be different ?

Let me state it differently : what if each mirror maintainer is a PSF member ?
does that addresse the legal/admin issues ?

>
> Moreover, there would also be mirrors in parts of the world
> that are currently not well covered by Pythonistas and thus
> less likely to get a local mirror server setup.

This is just a matter of having a server IP in that part of the world.
And in reality, as long as the main areas US, Europe, Austrialia, etc
are served,
this fits our needs. But some people will probably have to go through
several nodes
to reach a mirror, but we can't have a server per major city.

So in any case, we are improving the situation, not making it worse.

> How to arrange all this is really a PSF question more than
> anything else.
>
> Also note that using a static file layout would make the
> whole synchronization mechanism a lot easier - not only
> for content delivery networks, but also for dedicated
> volunteer run mirrors. There are lots of mirror scripts
> out there that work with rsync or FTP, so no need to reinvent
> the wheel.
>

Those scripts already exist and are in usage in the tools that are
mirroring pypi.
They are not rsync but http calls, but that's about it.

> AFAICTL, all the data on PyPI is static and can be rendered
> as files in a directory dump. A simple cronjob could take
> care of this every few minutes or so and extract the data
> to a local directory which is then made accessible to
> mirrors.

People are already doing rsync-like mirrors. But that's quite an
incomplete mirror.

The whole point of the work I've been doing with Martin (partially
reflected in PEP 381)
is to be able to have the download statistics for each archive, no
matter which mirror was used to download the file.  That's quite a
valuable information.

Regards
Tarek

-- 
Tarek Ziadé | http://ziade.org