[Pydotorg-redesign] how to search the site

Tim Parkin tim.parkin at pollenationinternet.com
Sat Sep 13 16:18:30 EDT 2003


>On Saturday, September 13, 2003, at 03:30  PM, Simon Willison wrote:
>> Alternatively, how about building a search engine on top of the 
>> excellent lupy (a port of the open source Lucene Java search engine)?

>> From my admittedly limited experience of Lupy it is a truly excellent

>> product - it provides a very powerful API for indexing documemts, and

>> a simple interface for running searches on them.
>
>Why would you not want to use google, as previously suggested, other 
>than NIH (Not Invented Here)?  It works, it's easy, it's fast, it's 
>free, we don't have to maintain it ourselves, everybody is familiar 
>with it, etc.  What's not to like?
The massive advantage of having your own search engine is that you can
customise it to 

1) exlude certain parts of the html in a site (ie furniture and menu's)
2) add your own keywords to pages and weight them if nescessary
3) add a category based sub-search
4) provide a better summary text for each returned result
5) provide a result title that isn't the page title
6) add your own stop words

There are more than just these reasons, I've swish / swish++ and lucene
and they are both excellent products. I'm with Simon and think Lupy
would be an appropriate Python search engine and would need little
setup. 

Google, whilst very useful, can only provide a vanilla search results
and even simple site based optimisation can dramatically improve these
results.

I'm with Lupy if in any way at all possible. Obviously if google are
willing to donate a search app or help optimise out results then
fantastic.

I would suggest:- 

1) Launch with google search
2) Ask google for an optimised solution and if not Add a Lupy search

Tim





More information about the Pydotorg-redesign mailing list