[Pydotorg-redesign] how to search the site

Simon Willison cs1spw at bath.ac.uk
Sat Sep 13 21:35:47 EDT 2003


Barry Warsaw wrote:
> If I were to cast my vote <wink> I'd go for the thing that takes the
> least amount of effort to set up and maintain, that doesn't suck.  Bonus
> points if we include the mailing list archives as a search corpus.

I just took a look at the mailing list archives and they total just over 
700 MB(!) - the largest is Python-Dev at 111 MB. Loading that lot in to 
a search engine could be a painful task. It looks like Google has 
indexed them all (incredibly) so a targetted Google search limited to 
the mail.python.org domain would probably suffice for mailing lists.

I still think there is a big advantage to be had in rolling a custom 
search engine for the site though - the ability to highlight certain 
site areas for specific keywords for example. I wonder if it would be 
possible to use the Google web services API to power a Python.org search 
engine? The API terms and conditions www.google.com/apis/api_terms.html 
say this:

"""
The Google Web APIs service is made available to you for your personal, 
non-commercial use only (at home or at work) [ ... ] And you may not use 
the search results provided by the Google Web APIs service with an 
existing product or service that competes with products or services 
offered by Google.
"""

I have no idea if a search engine for Python.org would count as 
"competing with products or services offered by Google". If it doesn't, 
a Google API powered search engine would give us all of the benefits of 
Google while still allowing the Python site to apply a custom template 
to the results and other enhancements (such as recommeded site areas for 
specific keywords).

Cheers,

Simon




More information about the Pydotorg-redesign mailing list