[Distutils] [Catalog-sig] Specification for package indexes?

Phillip J. Eby pje at telecommunity.com
Thu Jul 6 18:56:03 CEST 2006


At 01:03 AM 7/7/2006 +1000, richardjones at optusnet.com.au wrote:
> > Phillip J. Eby <pje at telecommunity.com> wrote:
> > Why not?  ;)
>
>That was actually what I was afraid the reasoning was ;)
>
>I guess I just go all wobbly in the knees at the thought of having to 
>maintain a "screen scraping" interface.

You don't need to -- at least not in the long term.  Once setuptools 0.7 
supports the XML-RPC interface, it won't need the other scraping tricks to 
read PyPI.  Those would be left in for people who are creating their own 
package indexes, not constraining further development of PyPI itself.

Please keep in mind that easy_install makes *extremely* minimal assumptions 
about PyPI's interface:

1. It assumes that baseURL/projectname will get to the current version of 
projectname, or a page with a list of projectname's active versions

2. It assumes that links within PyPI of the form 
baseURL/something1/something2 are links to version 'something2' of a 
project named 'something1'

3. It assumes that going to baseURL directly will result in a page with 
links to all available packages in the form described in #2.

4. It assumes that if baseURL/projectname returns a page containing the 
text "Index of Packages</title>", it is a list of links of the form 
described in #2.

5. It looks for and follows the first links following the strings "<th>Home 
Page" and "<th>Download URL" in a project page.

6. It makes assumptions about how to find MD5 data on a PyPI page, but if 
it fails to do so, it simply won't check the MD5 of downloads.

Also note that even with an XML-RPC interface, easy_install will *still* 
need to read an HTML page to gather links, because it's valid for people to 
provide links in their long_description using reStructuredText.  It's just 
that assumptions 1, 3, and 4 (and maybe 5) would not be necessary.

Also note that in a pinch, you can put the strings easy_install is looking 
for inside HTML comments.  Easy_install really isn't that bright.  ;)

However, if you can provide *all* of this data via the API (including an 
html-formatted long description), then the screen scraping can go away as 
far as PyPI is concerned.


>Funnily enough, Johannes Gisjbers, Andrew Dalke and I were talking about 
>this very issue last night. I proposed that we detect the user-agent of 
>the setuptools client, and in response send back really minimalist HTML 
>(no surrounding page template). Probably overkill, but this may have been 
>after we'd had beer :)

There's a simpler solution that could be implemented: adding a 
'rel="easy-install"' attribute to links that easy_install should 
follow.  Currently, those links are the project's home page URL, download 
URL, and the links to specific versions that show up when you go to a 
project that has multiple active versions.   Adding it to these, and *only* 
these links would give easy_install enough information to do the right 
thing.  However, support would have to wait for setuptools 0.7 anyhow, so 
there's little reason to do this.

Hm.  I just tried to make multiple versions of PEAK active, and it seems 
like you can't get the page that lists multiple versions any more.  No 
wonder some people have been having problems downloading older versions of 
certain packages.  :(

How are people supposed to get to older package versions now?  That is, 
what's the point of being able to have multiple active versions if you 
can't find them?  Is this an intended change, or a bug?


>Could you provide a clear list of all the specific changes you wish for us 
>to make at the Sprint?

I've provided a list above of what changes I want you *not* to make.  How's 
that? ;)


> > Nonetheless, there are various aspects of easy_install's behavior and
> > performance that could be significantly improved by using XML-RPC, so I
> > definitely want it to do that in 0.7.  I'm just wary of removing the
> > existing behavior until it's clear that it's unnecessary for it to.
>
>Oh - another thing that occurred to me -- does setuptools auto update itself?

What do you mean?  You can run "easy_install -u setuptools" to upgrade to 
the latest release at any time.  But it doesn't go out looking for updates 
on its own.



More information about the Distutils-SIG mailing list