Advice for a little search engine

Steve Purcell stephen_purcell at yahoo.com
Mon Apr 16 04:28:57 EDT 2001


Max Haas wrote:
> Problem: 738 files with Latin text. Every file represents a singular source.
> Find every match of a term. (I need e.g. structura, structuram, structurae
> etc.)


Sounds like we're helping with your homework. I'll resist the temptation
to post too much code for this... :-)


> The program:
> 
> 1. Enter the question and x and y. The question will then be the compiled
> object p.
> 2. Read every file in (something like fp.readlines()).
> 3. Transform the file to a string (string.join(list_of_file)).
> 4. m = p.findall(string). If m is not None then:
> a. Give the file contents (lines 10-15)
> b. look for the occurrence of every word in m and note the position (with
> string.find)
> c.  Give x words before the matched word, the matched word and then y words
> after
> ...
> 
> The main problem for me is: do I understand correctly the function of
> p.findall(string) in combination with string.find?


Better would be a loop using 'p.search()', which would find one occurrence
at a time. The match object returned by this function has attributes 'pos'
and 'endpos' which would let you locate the matched word in the string
containing the file contents.

-Steve

-- 
Steve Purcell, Pythangelist
Get testing at http://pyunit.sourceforge.net/
Any opinions expressed herein are my own and not necessarily those of Yahoo




More information about the Python-list mailing list