[Tutor] Opening Multiple Files
Alan Gauld
alan.gauld at btinternet.com
Fri Aug 17 10:11:43 CEST 2007
"Paulo Quaglio" <paulo_quaglio at yahoo.com> wrote
> I've solved the problem of opening several files to process "as a
> batch"
> with glob.glob(). Only now did I realize that the program and files
> need to be in the same folder..
They don't but you do need to pass a valid path to open()
Thus if your code is running from folder X and the files
are in folder Y you need to tell open to open Y/filename
rather than just filename. Similarly you need to tell glob
to glob(Y/pattern)
The other possibility is to change the working directory to
Y using os.chdir(Y)
> 1- I want to open several files and count the total number of
> words.
> If I do this with only 1 file, it works great. With several files
> ( now with glob),
> it outputs the total count for each file individually and not the
> whole corpus
So you will need to store that result in a variable and add the
totals:
total = 0
for file in filelist:
result = linesInFile(file)
print file, ": ", result # might not need/want this
total += result
print total
> 2- I also want the program to output a word frequency list
> (we do this a lot in corpus linguistics). When I do this with only
> one file,
> the program works great (with a dictionary). With several files, I
> end up
> with several frequency lists, one for each file.
Make the dictionary outside your loop and pass it into the full
analysis program:
# PSEUDO CODE ONLY!
words = {}
total = 0
for file in Flist
words, count = analyzeFile(file, words)
total += count
print total
for word in words:
print word, ':', words[word]
def AnalyzeFile(f, w)
linecount = 0
for line in f:
for word in line.split()
w[word] = w.get(word,0) + 1
return w,linecount
> This sounds like a loop type of problem, doesn't it?
No it sounds like a variable position problem, and possibly
a namespace issue too.
HTH,
--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld
More information about the Tutor
mailing list