[Python-Dev] setuptools: past, present, future

Phillip J. Eby pje at telecommunity.com
Sat Apr 22 18:39:51 CEST 2006


At 05:41 PM 4/22/2006 +1000, Nick Coghlan wrote:
>Phillip J. Eby wrote:
>>At 12:22 AM 4/22/2006 -0400, Terry Reedy wrote:
>>>Why can't you remove the heuristic and screen-scrape info-search code
>>>from the easy_install client and run one spider that would check
>>>new/revised PyPI entries, search for missing info, insert it into PyPI when
>>>found (and mark the entry eggified), or email the package author or a human
>>>search volunteer if it does not find enough?
>>I actually considered that at one point.  After all, I certainly have the 
>>technology.
>>However, I didn't consider it for more than 10 seconds or so.  Package 
>>authors have no reason to listen to some random guy with a bot -- but 
>>they do have reasons to listen to their users, both actual and potential.
>
>I'm not sure that's what Terry meant - I took it to mean *make the spider 
>part of PyPI itself*.

Which would also be accomplished by using Grig's Cheesecake tool, since it 
uses easy_install to fetch the source.


>Then all the heuristics and screen-scraping would be server-side - all 
>easy_install would have to do is look at the meta-data provided by the 
>PyPI spider.

Which is certainly attractive from the POV of being able to make changes 
quickly.

However, I forgot to mention another issue, because I was speaking from the 
point of view of the time when I designed the thing, not the present 
day.  After it was implemented, it has turned out that being able to point 
easy_install to web pages with a specific collection of packages (e.g. ones 
built for a specific OS version, or that are tested for a particular 
purpose, etc.) is *very* useful in practice.  And the people who are doing 
that, are just going to do whatever it takes to make their listing(s) work 
with easy_install, because that's the whole point for them.  So there 
doesn't have to be unlimited growth of heuristics there.

What it basically amounts to, then, is that easy_install heuristics 
currently only have to chase people who aren't trying to easy_install their 
packages.  For example, I discovered the other day that easy_install can 
get confused by bdist_dumb distributions.  So few people ever distribute 
bdist_dumb packages that I never ran into that as an issue before now.  So 
I had to update the heuristics to be able to tell from the filename whether 
a package is likely to be a bdist_dumb.

However, if PyPI is doing Cheesecake ratings, there will only be a finite 
number of such things to deal with, because when people make changes that 
break their ratings, they'll just fix the problem themselves, as it'll 
generally be faster than lobbying for new heuristics in easy_install.  As 
the community becomes better educated about making their package links easy 
to find, the amount of maintenance work needed for easy_install should drop 
off.  Right now, the main reason to add heuristics is to increase 
compatibility with whatever practices are already out there, in order to 
leverage the greatest number of existing packages to secure the greatest 
number of users.



More information about the Python-Dev mailing list