[Distutils] Python people want CPAN and how the latter came about

Fri Dec 25 09:00:41 CET 2009

Greetings Lennart,

On 12/24/2009 10:27 PM, Lennart Regebro wrote:
> On Fri, Dec 25, 2009 at 05:39, Sridhar Ratnakumar
> <sridharr at activestate.com>  wrote:
>> Is it because of this benefit to package authors that we are withholding the
>> implementation of a simple archive that would: 1) simplify the tools to no
>> rely on adhoc web scrapping
>
> There are better ways to do that.

May I ask, what would they be?

>> 2) reduce the downtime for users by rsync/ftp mirroring
>
> This is true, but the idea to upload them by robots is preferable in
> my opinion. Again it's a difference between trying to force other
> people to behave to your expectations vs trying to make it easier for
> others to behave to your expectations.
>
>> 3) have package sources mirrored so project owners do not have to
>> worry about downtime of their servers.
>
> That's *their* problem. If they don't want to upload, then they don't
> want to upload.

As the original proposal is to retain the existing behavior for already 
registered/uploaded package releases (such as Twisted) so existing 
systems will continue to work, but implement the suggested upload rules 
only for new requests (creation/register)- so as to gradually improve 
the quality of PyPI like that of other packaging systems - by 
encouraging authors to generate a reasonably good sdist (setup.py + 
PKG-INFO) and uploading them .. and consequently enabling the move 
towards a static archive that can easily be mirrored, I fail to see just 
what good is achieved by retaining the status quo.

If I want to use a web service, I obviously have to adhere to their 
rules and policies. Nobody is forcing me to do so.

I assume in good faith that package authors will be happy to adapt to 
the new system .. for the benefit of everyone. I will be happy to be 
proven otherwise. (Speculations are useless; how about we actually ask 
the package authors themselves?)

>> 4) enable proliferation of third-party tools like CPAN?
>
> That won't help.

Why not? Do you conceive of any reason apart from CPAN-like archives 
that would help in proliferation of mirror sites and third-party sites? 
I ask because I personally went through significant hurdles to setup a 
daily PyPI mirror-like area. I just don't see how someone merely 
interested in writing a third-party service, or setup a mirror of PyPI 
would be *most likely inclined* to face similar hurdles before giving 
up. Because I went through these hurdles, I was able to appreciate 
CPAN's design while reading about it [cpan.org/misc/ZCAN.html].

>> Nope, it matters not whether the metadata can be retrived via a simple HTTP
>> GET or XmlRpc.
>
> OK. Then you have two proposals: 1. Require uploading, which is a bad
> idea and 2. Making it easier to mirror the metadata, which seems
> reasonable, assuming it's currently hard. :)

Here's one idea (example only):

$ tar zxf foo-0.1.tar.gz
$ cp foo-0.1/PKG-INFO foo-0.1.tar.gz.PKG-INFO

>> Metadata is definitely needed. Otherwise, I'd have to extract the tarball of
>> each and every release of a pacticular package, in order to even find their
>> version number (it is unreliable to parse the filename to get version
>> number).
>
> Yes, but it's not particularly unreliable to compare the filename to
> see if it had been handled before. You don't even need to parse the
> version number for most services that work on the tarballs.

It is indeed unreliable to rely on filenames to get package versions 
(unless that sdist is generated by the `setup.py sdist` command). As 
I've mentioned elsewhere, some packages have weird filenames (eg: 
"latest.zip", "foo.py"); some others have '.dev' suffix in the filenames 
while setup.py:version (hence PKG-INFO) will not have the '.dev' prefix. 
And several other issues that I cannot recall right now.

I am not speculating as I've actually experimented with the PyPI index, 
mirroring it .. handling the metadata in packages, and building it.

>> As for the sdists, the following tools would need it: testing service,
>> quality ratings, thirdparty package managers (enstaller, PyPM) .. and not to
>> mention the various mirror sites.
>
> Yes, but since thay have the source package, and will have to unpack
> it and build the packages anyway, they also have the metadata.

It is not that simple. PyPM backend, for instance, is not monolithic as 
in doing only a sequential build of packages. It first loads the 
dependency graph (for which metadata - PKG-INFO/requires.txt - is 
required) from our internal mirror over the network. It is expensive to 
go extract each and every tarball .. from each build machine. After 
loading the dependency graph, and then comparing it with existing 
repository .. every day, new builds happen.

Certain packages even lack metadata (eg: no PKG-INFO in Twisted's sdist) 
in their source distributions .. which is another issue altogether.

Further, I can imagine search.cpan.org (which is not hosted by cpan.org 
folks) using only the metadata without touching the source distributions.

-srid