[Distutils] Python people want CPAN and how the latter came about
Sridhar Ratnakumar
sridharr at activestate.com
Fri Dec 25 09:00:41 CET 2009
Greetings Lennart,
On 12/24/2009 10:27 PM, Lennart Regebro wrote:
> On Fri, Dec 25, 2009 at 05:39, Sridhar Ratnakumar
> <sridharr at activestate.com> wrote:
>> Is it because of this benefit to package authors that we are withholding the
>> implementation of a simple archive that would: 1) simplify the tools to no
>> rely on adhoc web scrapping
>
> There are better ways to do that.
May I ask, what would they be?
>> 2) reduce the downtime for users by rsync/ftp mirroring
>
> This is true, but the idea to upload them by robots is preferable in
> my opinion. Again it's a difference between trying to force other
> people to behave to your expectations vs trying to make it easier for
> others to behave to your expectations.
>
>> 3) have package sources mirrored so project owners do not have to
>> worry about downtime of their servers.
>
> That's *their* problem. If they don't want to upload, then they don't
> want to upload.
As the original proposal is to retain the existing behavior for already
registered/uploaded package releases (such as Twisted) so existing
systems will continue to work, but implement the suggested upload rules
only for new requests (creation/register)- so as to gradually improve
the quality of PyPI like that of other packaging systems - by
encouraging authors to generate a reasonably good sdist (setup.py +
PKG-INFO) and uploading them .. and consequently enabling the move
towards a static archive that can easily be mirrored, I fail to see just
what good is achieved by retaining the status quo.
If I want to use a web service, I obviously have to adhere to their
rules and policies. Nobody is forcing me to do so.
I assume in good faith that package authors will be happy to adapt to
the new system .. for the benefit of everyone. I will be happy to be
proven otherwise. (Speculations are useless; how about we actually ask
the package authors themselves?)
>> 4) enable proliferation of third-party tools like CPAN?
>
> That won't help.
Why not? Do you conceive of any reason apart from CPAN-like archives
that would help in proliferation of mirror sites and third-party sites?
I ask because I personally went through significant hurdles to setup a
daily PyPI mirror-like area. I just don't see how someone merely
interested in writing a third-party service, or setup a mirror of PyPI
would be *most likely inclined* to face similar hurdles before giving
up. Because I went through these hurdles, I was able to appreciate
CPAN's design while reading about it [cpan.org/misc/ZCAN.html].
>> Nope, it matters not whether the metadata can be retrived via a simple HTTP
>> GET or XmlRpc.
>
> OK. Then you have two proposals: 1. Require uploading, which is a bad
> idea and 2. Making it easier to mirror the metadata, which seems
> reasonable, assuming it's currently hard. :)
Here's one idea (example only):
$ tar zxf foo-0.1.tar.gz
$ cp foo-0.1/PKG-INFO foo-0.1.tar.gz.PKG-INFO
>> Metadata is definitely needed. Otherwise, I'd have to extract the tarball of
>> each and every release of a pacticular package, in order to even find their
>> version number (it is unreliable to parse the filename to get version
>> number).
>
> Yes, but it's not particularly unreliable to compare the filename to
> see if it had been handled before. You don't even need to parse the
> version number for most services that work on the tarballs.
It is indeed unreliable to rely on filenames to get package versions
(unless that sdist is generated by the `setup.py sdist` command). As
I've mentioned elsewhere, some packages have weird filenames (eg:
"latest.zip", "foo.py"); some others have '.dev' suffix in the filenames
while setup.py:version (hence PKG-INFO) will not have the '.dev' prefix.
And several other issues that I cannot recall right now.
I am not speculating as I've actually experimented with the PyPI index,
mirroring it .. handling the metadata in packages, and building it.
>> As for the sdists, the following tools would need it: testing service,
>> quality ratings, thirdparty package managers (enstaller, PyPM) .. and not to
>> mention the various mirror sites.
>
> Yes, but since thay have the source package, and will have to unpack
> it and build the packages anyway, they also have the metadata.
It is not that simple. PyPM backend, for instance, is not monolithic as
in doing only a sequential build of packages. It first loads the
dependency graph (for which metadata - PKG-INFO/requires.txt - is
required) from our internal mirror over the network. It is expensive to
go extract each and every tarball .. from each build machine. After
loading the dependency graph, and then comparing it with existing
repository .. every day, new builds happen.
Certain packages even lack metadata (eg: no PKG-INFO in Twisted's sdist)
in their source distributions .. which is another issue altogether.
Further, I can imagine search.cpan.org (which is not hosted by cpan.org
folks) using only the metadata without touching the source distributions.
-srid
More information about the Distutils-SIG
mailing list