[Catalog-sig] [Distutils] Specification for package indexes?

Jim Fulton jim at zope.com
Sat Jul 8 13:38:06 CEST 2006


On Jul 7, 2006, at 9:12 PM, Phillip J. Eby wrote:

> At 04:45 PM 7/7/2006 -0400, Jim Fulton wrote:
...
>> By "download links", do you mean links to distributions?
>
> Yes.
>
>
>> Or to links
>> to pages containing links to distributions.
>
> No, these would be either "index pages", or "external links"

Which seems to be an important use case now.

>
>> Can the links to projects, links to version pages, or download links
>> point off site?
>
> Download links can be anywhere, since they are identified from the  
> tail of the URL.  The links to project or version pages are defined  
> by the URL hierarchy of the API.

Hm.  Why does it matter?  I understand that you want to be able to go to
index_url/project first, but I don't see that it matters where  
versions actually are.

For that matter, I could see value in a minimal index that just  
pointed to
external project pages.  In which case, going to index_url/project  
might even
be allowed to redirect to an offsite project page.  Of course, this  
couldn't be
implemented with a static server, but could still be a valuable option.



>
>> Can any of these pages contain other links?
>
> All of them can contain download links.  Index pages can link to  
> other index pages.  Index pages linked to anything else are  
> ignored, unless we allow "external links", in which case a method  
> of identifying them is required.

I think we want external links.  We have them now.  In fact, I think  
there is value in a
project index that has no distributions or even version information  
but provides
a central place to find project pages.

Note that, in a separate discussion, you pointed out that some  
considered it
bad form to put interim project releases on pypi.  If pypi could have  
links to
remote project pages, then those sites could have different policies  
as needed by
a project.


> Currently, easy_install identifies only uses two kinds of external  
> links: home page and "download URL".  These are identified via HTML  
> snippets that PyPI uses.  This is one of only two pieces of "screen  
> scraping" (as opposed to URL inspection and link detection) that  
> easy_install has.  (The other is used to distinguish between a page  
> that lists links to projects, from an actual project page, as  
> sometimes PyPI can display the former at a URL that is nominally  
> for the latter.)



>
>>> This is a sufficient API to allow querying packages for downloading
>>> purposes, as long as all download links are found in the index's
>>> pages.  Additional information is only needed to allow following
>>> external links to *other index pages*.
>>
>> so, for example:
>>
>>   http://www.python.org/pypi/ZODB3/3.6.0
>>
>> Has a link to http://www.zope.org/Products/ZODB3.6.
>> Is this a download link? Or an off-site index link. I'm having a
>> little trouble
>> following the jargon.
>
> It's an "external link", and thus only followed if it's seen to be  
> the "home page" or "download URL" on a package version page.

Right, which is currently identified by sniffing the surrounding HTML.

>
>>> Sure.  I'm just saying we only need something beyond href="" links
>>> if they are intended to be followed by tools looking for package
>>> links.
>>>
>>> The reason this is necessary, is that it's not sufficient to just
>>> follow links that point outside the package index; PyPI has links
>>> on its pages that go to other parts of python.org, so there needs
>>> to be something that distinguishes "links that might help find
>>> downloads".  Links that *are* downloads are detected via URL  
>>> content.
>>
>> Right. That's why I think the hrefs we care about should be marked
>> with class
>> attributes or some such.
>
> Yes, as long as we care about supporting the external links.  I'm  
> not certain we do, at least for the "third-party index" case.

I think we do.  I'm pretty sure we do for pypi and I sure has heck  
don't want a different
api for pypi and for other indexes.  I'd really like to see a single  
index api.

I would *like* to see the possibility of allowing off-site (off- 
index) projects,
although I could live without this.

I have to say again that all of these details can get quite confusing.
Maybe I'm alone in being confused by this, but I don't think so.   
I've spent
a lot of time on and off over the last few months trying to leverage  
setuptools
and now pypi and while I've had a lot of success, it has been harder  
than I
think it should be.  I think that this is an impediment to greater  
adoption of and
benefit from setuptools.  I think we need to do a good job of  
documenting and
explaining this API.  I also think we need to write up some best  
practices
or rational to guide people toward better use of setuptools and pypi  
together.
I'm happy to help with this once we have agreement and once I
understand what we agree to. :)

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org





More information about the Catalog-sig mailing list