How to quickly search over a large number of files using python?

Roy Smith roy at panix.com
Wed Sep 25 08:41:12 EDT 2013


In article <60f36178-b584-4fcb-8ad9-2dac6052e6d8 at googlegroups.com>,
 dwivedi.devika at gmail.com wrote:

> Hi all,
> 
> I am a newbie to python.
> 
> I have about 500 search queries, and about 52000 files in which I have to 
> find all matches for each of the 500 queries.

Before anybody can even begin to answer this question, we need to know 
what you mean by "search query".  Are you talking pattern matching, 
keyword matching, fuzzy hits OK, etc?  Give us a couple of examples of 
the kind of searches you'd like to execute.

Also, is this a one-off thing, or are you planning to do many searches 
over the same collection of files?  If so, you will want to do some sort 
of pre-processing or indexing to speed up the search execution.  It's 
extremely unlikely you want to reinvent the wheel here.  There are tons 
of search packages out there that do this sort of thing.  Just a few to 
check out include Apache Lucene, Apache Solr, and Xapian.



More information about the Python-list mailing list