[Catalog-sig] [Distutils] Specification for package indexes?

Phillip J. Eby pje at telecommunity.com
Sat Jul 8 03:12:03 CEST 2006


At 04:45 PM 7/7/2006 -0400, Jim Fulton wrote:

>On Jul 7, 2006, at 4:20 PM, Phillip J. Eby wrote:
>
>>At 02:52 PM 7/7/2006 -0400, Jim Fulton wrote:
>...
>>>Perhaps someone should propose an API and we'll see. :)
>>
>>I thought I already did.  :)  Here it is again:
>>
>>baseURL/ should return a page containing href links to projects
>>baseURL/projectname should return a page containing href links to
>>version pages
>>baseURL/projectname/version should return a page with download
>>links (ideally with MD5 info)
>>Links are found via href="" attributes
>>URLs' trailing path components are used to identify distributions.
>
>Hm. I hadn't seen this before. Perhaps I'm missing some messages from
>this thread.
>
>By "download links", do you mean links to distributions?

Yes.


>Or to links
>to pages containing links to distributions.

No, these would be either "index pages", or "external links"


>Can the links to projects, links to version pages, or download links
>point off site?

Download links can be anywhere, since they are identified from the tail of 
the URL.  The links to project or version pages are defined by the URL 
hierarchy of the API.


>Can any of these pages contain other links?

All of them can contain download links.  Index pages can link to other 
index pages.  Index pages linked to anything else are ignored, unless we 
allow "external links", in which case a method of identifying them is required.

Currently, easy_install identifies only uses two kinds of external links: 
home page and "download URL".  These are identified via HTML snippets that 
PyPI uses.  This is one of only two pieces of "screen scraping" (as opposed 
to URL inspection and link detection) that easy_install has.  (The other is 
used to distinguish between a page that lists links to projects, from an 
actual project page, as sometimes PyPI can display the former at a URL that 
is nominally for the latter.)


>>This is a sufficient API to allow querying packages for downloading
>>purposes, as long as all download links are found in the index's
>>pages.  Additional information is only needed to allow following
>>external links to *other index pages*.
>
>so, for example:
>
>   http://www.python.org/pypi/ZODB3/3.6.0
>
>Has a link to http://www.zope.org/Products/ZODB3.6.
>Is this a download link? Or an off-site index link. I'm having a
>little trouble
>following the jargon.

It's an "external link", and thus only followed if it's seen to be the 
"home page" or "download URL" on a package version page.



>>Sure.  I'm just saying we only need something beyond href="" links
>>if they are intended to be followed by tools looking for package
>>links.
>>
>>The reason this is necessary, is that it's not sufficient to just
>>follow links that point outside the package index; PyPI has links
>>on its pages that go to other parts of python.org, so there needs
>>to be something that distinguishes "links that might help find
>>downloads".  Links that *are* downloads are detected via URL content.
>
>Right. That's why I think the hrefs we care about should be marked
>with class
>attributes or some such.

Yes, as long as we care about supporting the external links.  I'm not 
certain we do, at least for the "third-party index" case.



More information about the Catalog-sig mailing list