[Tutor] Opening Multiple Files

Alan Gauld alan.gauld at btinternet.com
Fri Aug 17 10:11:43 CEST 2007


"Paulo Quaglio" <paulo_quaglio at yahoo.com> wrote

>  I've solved the problem of opening several files to process "as a 
> batch"
> with glob.glob(). Only now did I realize that the program and files
> need to be in the same folder..

They don't but you do need to pass a valid path to open()
Thus if your code is running from folder X and the files
are in folder Y you need to tell open to open Y/filename
rather than just filename. Similarly you need to tell glob
to glob(Y/pattern)

The other possibility is to change the working directory to
Y using os.chdir(Y)

>   1- I want to open several files and count the total number of 
> words.
> If I do this with only 1 file, it works great. With several files 
> ( now with glob),
> it outputs the total count for each file individually and not the 
> whole corpus

So you will need to store that result in a variable and add the 
totals:
total = 0
for file in filelist:
    result = linesInFile(file)
    print file, ": ", result  # might not need/want this
    total += result
print total

>   2- I also want the program to output a word frequency list
> (we do this a lot in corpus linguistics). When I do this with only 
> one file,
> the program works great (with a dictionary). With several files, I 
> end up
> with several frequency lists, one for each file.

Make the dictionary outside your loop and pass it into the full
analysis program:

# PSEUDO CODE ONLY!
words = {}
total = 0
for file in Flist
    words, count = analyzeFile(file, words)
    total += count
print total
for word in words:
   print word, ':', words[word]

def AnalyzeFile(f, w)
     linecount = 0
     for line in f:
         for word in line.split()
              w[word] = w.get(word,0) + 1
     return w,linecount

> This sounds like a loop type of problem, doesn't it?

No it sounds like a variable position problem, and possibly
a namespace issue too.

HTH,

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 




More information about the Tutor mailing list