Site search

Gandalf gandalf at geochemsource.com
Mon Feb 16 01:34:25 EST 2004


Hi All! I need to create a "site search" feature for a website. I would 
like to create a service which could be
pointed to a directory. It should go over all subfolders, read all 
HTML,ASP,PHP,TXT and PDF files, and
create a table indexed by words. The most important would be...

1. It should index PDF files too. (The site contains many datasheets so 
this is curical.)
2. It should not index special keywords inside HTML and PDF file (so if 
somebody would search for "green" then it should only lookup "green 
cables" and "green grass", but not <FONT COLOR="GREEN">)

 Is there a library out there that can do the task for me? I can easily 
do all parts except parsing a file and gather keywords.

Thanks in advance.

   Laci 2.0






More information about the Python-list mailing list