Looking for lots of words in lots of files

Cong frigoris.ma at gmail.com
Wed Jun 18 21:01:55 EDT 2008


On Jun 18, 11:01 pm, Kris Kennaway <k... at FreeBSD.org> wrote:
> Calvin Spealman wrote:
> > Upload, wait, and google them.
>
> > Seriously tho, aside from using a real indexer, I would build a set of
> > thewordsI'mlookingfor, and then loop over each file, looping over
> > thewordsand doing quick checks for containment in the set. If so, add
> > to a dict of file names to list ofwordsfound until the list hits 10
> > length. I don't think that would be a complicated solution and it
> > shouldn't be terrible at performance.
>
> > If you need to run this more than once, use an indexer.
>
> > If you only need to use it once, use an indexer, so you learn how for
> > next time.
>
> If you can't use an indexer, and performance matters, evaluate using
> grep and a shell script.  Seriously.
>
> grep is a couple of orders of magnitude faster at pattern matching
> strings infiles(and especially regexps) than python is.  Even if you
> are invoking grep multiple times it is still likely to be faster than a
> "maximally efficient" single pass over the file in python.  This
> realization was disappointing to me :)
>
> Kris

Alternatively, if you don't feel like writing shell scripts, you can
write a Python program which auto-generate the desired shell script
which utilizes grep. E.g. use Python for generating the file list
which is passed to grep as arguments. ;-P



More information about the Python-list mailing list