[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

Fri Oct 7 12:21:33 CEST 2011

On 10/07/2011 04:08 AM, lina wrote:
> <snip>
> I thought it might be some loop reason made it double output the results, so
> I made an adjustation in indent, now it showed:
> $ python3 counter-vertically-v2.py
> {'B': [0, 0, 0, 0, 0, 0], 'E': [1, 0, 1, 0, 1, 0]}
> {'B': [0, 0, 0, 0, 0, 0], 'E': [1, 0, 1, 0, 1, 0]}
> [1, 0, 1, 0, 1, 0]
> Traceback (most recent call last):
>    File "counter-vertically-v2.py", line 48, in<module>
>      dofiles(".")
>    File "counter-vertically-v2.py", line 13, in dofiles
>      processfile(filename)
>    File "counter-vertically-v2.py", line 31, in processfile
>      for a,b in zip(results['E'],results['B']):
> KeyError: 'E'
>
> still two results, but the summary is correct, with a KeyError which I don't
> know how to fix the key error here.
>
> #!/bin/python3
>
> import os.path
>
>
> TOKENS="BE"
> LINESTOSKIP=0
> INFILEEXT=".xpm"
> OUTFILEEXT=".txt"
>
> def dofiles(topdirectory):
>      for filename in os.listdir(topdirectory):
>          processfile(filename)
>
> def processfile(infilename):
>      results={}
>      base, ext =os.path.splitext(infilename)
>      if ext == INFILEEXT:
>          text = fetchonefiledata(infilename)
>          numcolumns=len(text[0])
>          for ch in TOKENS:
>              results[ch] = [0]*numcolumns
>          for line in text:
>              line = line.strip()
>          for col, ch in enumerate(line):
>              if ch in TOKENS:
>                  results[ch][col]+=1
>      for k,v in results.items():
>          print(results)
That'll print the whole map for each item in it.  Since you apparently 
have two items, "E" and "B", you get the whole thing printed out twice.

I have no idea what you really wanted to print, but it probably was k and v

>      summary=[]
>      for a,b in zip(results['E'],results['B']):
>          summary.append(a+b)
>      print(summary)
>      writeonefiledata(base+OUTFILEEXT,summary)
>
> def fetchonefiledata(inname):
>      infile = open(inname)
>      text = infile.readlines()
>      return text[LINESTOSKIP:]
>
> def writeonefiledata(outname,summary):
>      outfile = open(outname,"w")
>      for elem in summary:
>          outfile.write(str(summary))
>
>
> if __name__=="__main__":
>      dofiles(".")
>
> Thanks all for your time,
>
>
As for the reason you got the exception, it probably was because the 
NEXT file had no E's in it.

One of the reasons to break this stuff into separate functions is so you 
can test them separately.  You probably should be calling processfile() 
directly in your top-level code, till it all comes out correctly.  Or at 
least add a print of the filename it's working on.

Anyway, it's  probably a mistake to ever reference "E" and "B" 
explicitly, but instead loop through the TOKENS.  That way it'll still 
work when you add more or different tokens.  Further, if it's considered 
valid for an input file not to have samples of all the tokens, then you 
have to loop through the ones you actually have.  That might mean 
looping through the keys of results.  Or, for the particular use case in 
that line, there's undoubtedly a method of results that will give you 
all the values in a list.  That list would make an even better argument 
to zip().  Once again, I remind you of the dir() function, to see 
available methods.

-- 

DaveA