Searching through more than one file.
Dave Angel
davea at davea.name
Sun Dec 28 14:26:30 EST 2014
On 12/28/2014 02:12 PM, Dave Angel wrote:
> On 12/28/2014 12:27 PM, Seymore4Head wrote:
>> I need to search through a directory of text files for a string.
>> Here is a short program I made in the past to search through a single
>> text file for a line of text.
>>
>> How can I modify the code to search through a directory of files that
>> have different filenames, but the same extension?
>>
>
> You have two other replies to your specific question, glob and
> os.listdir. I would also mention the module fileinput:
>
> https://docs.python.org/2/library/fileinput.html
>
> import fileinput
> from glob import glob
>
> fnames = glob('*.txt')
> for line in fileinput.input(fnames):
> pass # do whatever
>
> If you're not on Windows, I'd mention that the shell will expand the
> wildcards for you, so you could get the filenames from argv even
> simpler. See first example on the above web page.
>
>
> I'm more concerned that you think the following code you supplied does a
> search for a string. It does something entirely different, involving
> making a crude dictionary. But it could be reduced to just a few lines,
> and probably take much less memory, if this is really the code you're
> working on.
Note: the changes I suggest also should be tons faster, if you have
very many words you're parsing this way.
>
>> fname = raw_input("Enter file name: ") #"*.txt"
>> fh = open(fname)
>> lst = list()
>> biglst=[]
>> for line in fh:
>> line=line.rstrip()
>> line=line.split()
>> biglst+=line
>> final=[]
>> for out in biglst:
>> if out not in final:
>> final.append(out)
>> final.sort()
>> print (final)
>>
>
> Something like the following:
Untested, I should have said.
>
> import fileinput
> from glob import glob
>
> res = set()
> fnames = glob('*.txt')
> for line in fileinput.input(fnames):
> res.update(line.rstrip().split())
And I should have omitted the rsplit(), which does nothing that split()
isn't already going to do.
> print sorted(res)
>
>
>
>
--
DaveA
More information about the Python-list
mailing list