[Catalog-sig] Search engine relevance

Richard Jones r1chardj0n3s at gmail.com
Sun Mar 10 09:23:43 CET 2013


On 10 March 2013 19:05, Yuval Greenfield <ubershmekel at gmail.com> wrote:
> On Fri, Mar 8, 2013 at 11:26 PM, Richard Jones <r1chardj0n3s at gmail.com>
> wrote:
>>
>> That *was* the original search engine :-)
>>
>> Then after user complaints we devised a better solution...
>>
>> Always happy to take criticism of it and improve it! :-)
>>
>> Sent from my portable device, please excuse the brevity.
>>
>>
>
> We can go a few directions:
>
> Easy & python.org styled
> * google's JS search API to get, parse and display results. $5 per 1K
> queries.
> * bing's JS search API. 5$ per 2.5K queries.

Would be worth investigating if we can reasonably format the results.
Figuring out the billing will be something to discuss with the PSF
admin.


> Easy but external
> * textbox links to a google/bing search with site:pypi.python.org

As I said, this is how it was done, but there were complaints.


> Hard to get good results, but perhaps easy to try:
> * Change/improve internal search engine, and invent a good ranking
> algorithm.

We could probably just use the text search stuff built into postgres,
rather than the current naive LIKE searching. There is a ranking
algorithm in place and it does strongly prefer matching the name
you've entered; it doubly prefers an exact package name match. This
might solve the AGI problem and could probably produce good results
using the current ranking algorithm. Not sure. Google's search
algorithms are far advanced ;-)


> Though I wouldn't say this is high priority at all. I personally never use
> pypi search, just site:pypi.python.org on google.

I also often use google - but I don't even bother with the site: bit.
My go-to search is usually just "python <whatever>".

I note though that unless I add "site:pypi.python.org" to the search
even google struggles to suggest something on PyPI (try "python
agi"...)


    Richard


More information about the Catalog-SIG mailing list