[Tutor] writing a search engine

Sean 'Shaleh' Perry shalehperry@attbi.com
Mon Jun 30 22:28:02 2003


On Monday 30 June 2003 18:48, Kyle Babich wrote:
> Any suggestions on where to start?  I plan on writing one for a small
> site of mine.  It will basically be "google-style"- simple and
> functional.  The item being searched will be text files, where users can
> search based on title, contents, or both.  I was reading through some
> books but I'm stumped on where to begin.  I've never tried writing
> something even similar to a search engine and I guess I'm little
> confused/overwhelmed/bewildered.  :)  You guys always come to the rescue
> for me, so how about one more time?

sounds like time for "teach a man to fish" .......

A search engine eh? hmm, what would that take.  Let's brain storm.

- user enters some text.

- We open the first file and scan it looking for the text.  Is it there?  
Report if yes.  This piece right here is a good place to start writing code.
No need for web or gui, just write a little console app.  If you are familiar 
with Unix this may resemble grep.

- repeat for every file in the directory / disk / whatever.

hmmm, that would be slow, wouldn't it?  Even so, you could have a solution 
fairly rapidly that worked.

This is a classic computer science problem.  The user wants the answer faster 
than we can find it.  How would you make the user happy?  What makes the 
searching slow?  Ponder this and you are on your way .........

A key to programming is learning to break the problem into bite sized pieces.  
Every problem has smaller problems within that can be solved individually.  
Eventually you get enough of the pieces to put the puzzle together.

When faced with a large task ask yourself: how would it be used?  is there a 
similar problem?  have I solved something similar before?

In this case you have probably searched in strings before.  Or heard about it.  
Start there.  Once you can find the information the next part of the problem 
is finding it in lots of places.  Then it is making it web accessible.  Then 
you have to make it fast enough to users to want it.  Each pieces is its own 
journey.