[Tutor] Searching through files for values

Peter Otten __peter__ at web.de
Fri Aug 14 11:07:52 CEST 2015


Jason Brown wrote:

> (accidentally replied directly to Cameron)
> 
> Thanks, Cameron.  It looks like that value_file.close() tab was
> accidentally tabbed when I pasted the code here.  Thanks for the
> suggestion
> for using 'with' though!  That's will be handy.
> 
> To test, I tried manually specifying the list:
> 
> vals = [ 'value1', 'value2', 'value3' ]
> 
> And I still get the same issue.  Only the first value in the list is
> looked up.

The problem is in the following snippet:

>     with open(file_list) as files:
>          for items in vals:
>              for line in files:
>                  if items in line:
>                      print file_list, line
> 

I'll change it to some meaningful names:

with open(filename) as infile:
    for search_value in vals:
        for line in infile:
            if search_value in line:
                print filename, "has", search_value, "in line", line.strip()

You open infile once and then iterate over its lines many times, once for 
every search_value. But unlike a list of lines you can only iterate once 
over a file:

$ cat values.txt
alpha
beta
gamma
$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> lines = open("values.txt")
>>> for line in lines: print line.strip()
... 
alpha
beta
gamma
>>> for line in lines: print line.strip()
... 
>>>

No output in the second loop. The file object remembers the current position 
and starts its iteration there. Unfortunately you have already reached the 
end, so there are no more lines. Possible fixes:

(1) Open a new file object for every value:

for filename in filenames:
    for search_value in vals:
        with open(filename) as infile:
            for line in infile:
                if search_value in line:
                    print filename, "has", search_value, 
                    print "in line", line.strip()

(2) Use seek() to reset the position of the file pointer:

for filename in filenames:
    with open(filename) as infile:
        for search_value in vals:
            infile.seek(0)
            for line in infile:
                if search_value in line:
                    print filename, "has", search_value, 
                    print "in line", line.strip()

(3) If the file is small or not seekable (think stdin) read its contents in 
a list and iterate over that:

for filename in filenames:
    with open(filename) as infile:
        lines = infile.readlines()
        for search_value in vals:
            for line in lines:
                if search_value in line:
                    print filename, "has", search_value, 
                    print "in line", line.strip()

(4) Adapt your algorithm to test all search values against a line before you 
proceed to the next line. This will change the order in which the matches 
are printed, but will work with both stdin and huge files that don't fit 
into memory. I'll leave the implementation to you as an exercise ;)




More information about the Tutor mailing list