Searching through more than one file.

Dave Angel davea at davea.name
Sun Dec 28 14:26:30 EST 2014


On 12/28/2014 02:12 PM, Dave Angel wrote:
> On 12/28/2014 12:27 PM, Seymore4Head wrote:
>> I need to search through a directory of text files for a string.
>> Here is a short program I made in the past to search through a single
>> text file for a line of text.
>>
>> How can I modify the code to search through a directory of files that
>> have different filenames, but the same extension?
>>
>
> You have two other replies to your specific question, glob and
> os.listdir.  I would also mention the module fileinput:
>
> https://docs.python.org/2/library/fileinput.html
>
> import fileinput
> from glob import glob
>
> fnames = glob('*.txt')
> for line in fileinput.input(fnames):
>      pass # do whatever
>
> If you're not on Windows, I'd mention that the shell will expand the
> wildcards for you, so you could get the filenames from argv even
> simpler.  See first example on the above web page.
>
>
> I'm more concerned that you think the following code you supplied does a
> search for a string.  It does something entirely different, involving
> making a crude dictionary.  But it could be reduced to just a few lines,
> and probably take much less memory, if this is really the code you're
> working on.

Note:  the changes I suggest also should be tons faster, if you have 
very many words you're parsing this way.

>
>> fname = raw_input("Enter file name: ")  #"*.txt"
>> fh = open(fname)
>> lst = list()
>> biglst=[]
>> for line in fh:
>>      line=line.rstrip()
>>      line=line.split()
>>      biglst+=line
>> final=[]
>> for out in biglst:
>>      if out not in final:
>>          final.append(out)
>> final.sort()
>> print (final)
>>
>


> Something like the following:
Untested, I should have said.

>
> import fileinput
> from glob import glob
>
> res = set()
> fnames = glob('*.txt')
> for line in fileinput.input(fnames):
>      res.update(line.rstrip().split())

And I should have omitted the rsplit(), which does nothing that split() 
isn't already going to do.

> print sorted(res)
>
>
>
>


-- 
DaveA



More information about the Python-list mailing list