Looking for lots of words in lots of files

Jeff McNeil jeff at jmcneil.net
Wed Jun 18 12:58:05 EDT 2008


On Jun 18, 10:29 am, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> brad wrote:
> > Just wondering if anyone has ever solved this efficiently... not looking
> > for specific solutions tho... just ideas.
>
> > I have one thousand words and one thousand files. I need to read the
> > files to see if some of the words are in the files. I can stop reading a
> > file once I find 10 of the words in it. It's easy for me to do this with
> > a few dozen words, but a thousand words is too large for an RE and too
> > inefficient to loop, etc. Any suggestions?
>
> Use an indexer, like lucene (available as pylucene) or a database that
> offers word-indices.
>
> Diez

I've been toying around with Nucular (http://nucular.sourceforge.net/)
a bit recently for some side projects. It's pure Python and seems to
work fairly well for my needs. I haven't pumped all that much data
into it, though.



More information about the Python-list mailing list