How to quickly search over a large number of files using python?
Roy Smith
roy at panix.com
Wed Sep 25 08:41:12 EDT 2013
In article <60f36178-b584-4fcb-8ad9-2dac6052e6d8 at googlegroups.com>,
dwivedi.devika at gmail.com wrote:
> Hi all,
>
> I am a newbie to python.
>
> I have about 500 search queries, and about 52000 files in which I have to
> find all matches for each of the 500 queries.
Before anybody can even begin to answer this question, we need to know
what you mean by "search query". Are you talking pattern matching,
keyword matching, fuzzy hits OK, etc? Give us a couple of examples of
the kind of searches you'd like to execute.
Also, is this a one-off thing, or are you planning to do many searches
over the same collection of files? If so, you will want to do some sort
of pre-processing or indexing to speed up the search execution. It's
extremely unlikely you want to reinvent the wheel here. There are tons
of search packages out there that do this sort of thing. Just a few to
check out include Apache Lucene, Apache Solr, and Xapian.
More information about the Python-list
mailing list