[Tutor] Opening Multiple Files
Paulo Quaglio
paulo_quaglio at yahoo.com
Fri Aug 17 07:10:18 CEST 2007
Hi everyone,
Thanks for all suggestions. Let me just preface this by saying that Im new to both python and programming. I started learning 3 months ago with online tutorials and reading the questions you guys post. So, thank you all very, very much
and I apologize if Im doing something really stupid..:-) OK. Ive solved the problem of opening several files to process as a batch with glob.glob(). Only now did I realize that the program and files need to be in the same folder
. Now I have another problem.
1- I want to open several files and count the total number of words. If I do this with only 1 file, it works great. With several files ( now with glob), it outputs the total count for each file individually and not the whole corpus (see comment in the program below).
2- I also want the program to output a word frequency list (we do this a lot in corpus linguistics). When I do this with only one file, the program works great (with a dictionary). With several files, I end up with several frequency lists, one for each file. This sounds like a loop type of problem, doesnt it? I looked at the indentations too and I cant find what the problem is. Your comments, suggestions, etc are greatly appreciated. Thanks again for all your help. Paulo
Here goes what I have.
# The program is intended to output a word frequency list (including all words in all files) and the total word count
def sortfile(): # I created a function
filename = glob.glob('*.txt') # this works great! Thanks!
for allfiles in filename:
infile = open(allfiles, 'r')
lines = list(infile)
infile.close()
words = [] # initializes list of words
wordcounter = 0
for line in lines:
line = line.lower() # after this, I have some clunky code to get rid of punctuation
words = words + line.split()
wordfreq = [words.count(wrd)for wrd in words] # counts the freq of each word in a list
dictionary = dict(zip(words, wordfreq))
frequency_list = [(dictionary[key], key)for key in dictionary]
frequency_list.sort()
frequency_list.reverse()
for item in frequency_list:
wordcounter = wordcounter + 1
print item
print "Total # of words:", wordcounter # this will give the word count of the last file the program reads.
print "Total # of words:", wordcounter # if I put it here, I get the total count after each file
sortfile()
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070816/1c0241ea/attachment.htm
More information about the Tutor
mailing list