[Distutils] [Catalog-sig] Specification for package indexes?
Phillip J. Eby
pje at telecommunity.com
Fri Jul 7 18:18:32 CEST 2006
At 06:55 AM 7/7/2006 -0400, Jim Fulton wrote:
> From a design perspective:
>
>a. screen scraping is bad
As long as you define "screen scraping" as "dependency on visible
characteristics of HTML", then I agree. easy_install shouldn't be relying
on the visible bits of HTML that it currently uses to scope out PyPI.
Relying on a particular URL layout is not screen-scraping, though, and
using the URL layout as part of the API has some good properties for ease
of implementation in static form. So does using href's to obtain link
information.
What we should be doing is adding non-visible markup (e.g. class="" or
rel="") information to the links to allow index creators to direct
easy_install without affecting visible page characteristics.
>b. the web API should be simple and well defined.
>
>I suggest, as others have suggested, that we create an *alternate*
>web API for reading an index focussed on cleanliness and on making
>the API as easy as possible to implement for both index and client
>developers. If we agree with all of the goals stated above, I think
>this should be static HTTP interface using XHTML or some other XML
>dialect. Perhaps we could even use specific HTML class attrs to
>make it possible to combine the pypi and user interfaces if an index
>implementor desires.
>
>Thoughts?
+1 on static pages. I don't, however, see a reason to require valid
XML. Or rather, I don't expect to implement XML parsing in easy_install;
if the spec is too complex to implement with regular expression matching,
it's probably too complex for people to throw together an index with what's
at hand. In particular, I'd like it to be practical to put together a
simple index just using Apache's built-in directory indexes, as long as
they use the right URL hierarchy. That means that class or rel attributes
should only be required for links that are requesting non-index pages to be
spidered.
More information about the Distutils-SIG
mailing list