"grep" database

Mike Meyer mwm at mired.org
Fri Sep 9 17:53:12 EDT 2005


"Hilbert" <lists at panka.com> writes:
> I've heard of a software on linux that creates a recursive database of
> text files  and then provides an interface for grep-like queries. I'd
> like to use it to find procedures/variables in a large code base.
>
> Any suggestions appreciated.

The great granddaddy of them all is WAIS, but it uses a client-server
model instead of grep-like queries. It has since been standardized as
z39.50. The one most like what you want is probably glimpse, which
includes a command-line tool called "agrep" for searching the
database. Swish and HARVEST also come to mind as tools in that model.

You're most likely to find these tools being used as or bundled as
part of web site search engines. They all support "structured" text
files, meaning they parse them to assign tags to parts of the content,
and let you use those tags 

How big is your data set? I gave up trying to find one that would
properly index 2+ gig of text files - the index data structures kept
running into some form of memory limit. I finally gave up and used
the file system layout to handle part of the search, doing finds on
re's for the name - and a custom tool to look for further structure
inside the files.

If you're going to be dealing with large data sets, I'd like to know
if you find something that works well for you.

   <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list