[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

Wed Oct 5 07:42:15 CEST 2011

On 10/04/2011 11:13 PM, lina wrote:
> On Wed, Oct 5, 2011 at 10:45 AM, Dave Angel<d at davea.name>  wrote:
>
>> On 10/04/2011 10:22 PM, lina wrote:
>>
>>> On Wed, Oct 5, 2011 at 1:30 AM, Prasad, Ramit<ramit.prasad at jpmorgan.**com<ramit.prasad at jpmorgan.com>
>>>> w
>> <SNIP>
>>> SyntaxError: invalid syntax
>>>
>>> for fileName in os.listdir("."):
>>>      if os.path.isfile(fileName) and os.path.splitext(fileName)[1]=**
>>> =".xpm":
>>>          filedata = open(fileName)
>>>          text=filedata.readlines()
>>>          cols = len(text[0])
>>>          except IndexError:
>>>              print ("Index Error.")
>>>          result=[]
>>>          for idx in xrange(cols):
>>>              results.append(0)
>>>          for line in text:
>>>              for col_idx, field in enumerate(line):
>>>                  if token in field:
>>>                      results[col_idx]+=1
>>>              for index in col_idx:
>>>                  print results[index]
>>>
>>> it showed up:
>>>
>>>      print results[]
>>>                  ^
>>> SyntaxError: invalid syntax
>>>
>>> Sorry, I am still lack deep understanding about something basic. Thanks
>>> for
>>> your patience.
>>>
>>>
>>>   Simplest answer here is you might have accidentally run this under Python
>> 3.x.  That would explain the syntax error on the print function.   Pick a
>> single version and stick to it.  In fact, you might even put a version test
>> at the beginning of the code to give an immediate error.
>>
> choose python3.
>
Then change that last print to use parentheses.  print() is a function 
call in Python 3.x, while it was a statement in earlier Python versions.

> <SNIP>
>> This example illustrates one reason why it's a mistake to write all the
>> code at top level.  This code should probably be at least 4 functions, with
>> each one handling one abstraction.
>>
> It's frustrating. Seriously. (I think I need to read some good (relevant)
> codes first.
>
Is Python your first programming language?  It was approximately my 30th.

I learned "programming" from a Fortran book in 1967.  I had no access to 
a computer, though there was at least one in the state, at the Yale 
campus.  I saw it in a field trip by the (advanced) students that were 
taking programming.  They weren't allowed to take it till finishing 2nd 
year calculus, which I didn't do till I got to college.  However, when I 
went to college the following year, I ran across another student who 
knew how to access the mainframe (via punch-cards), and could tell me 
how to do it.  (Security was very light).  For a few months, I hacked 
daily, and learned a lot.  Then the following year, I actually took an 
electrical engineering class that introduced the concepts of 
programming, and I spent my time doing experiments that barely resembled 
the assignments.  I ended up with an incomplete in the course, which I 
made up by writing a linear circuit analysis program.  Punched card 
input, graphical output to a line printer using rows of asterisks.

Point is, it takes a lot of time, and usually a one-on-one mentor to get 
the concepts nailed down.  Seldom did anyone tell me "write these lines 
down, and it'll solve the problem."  instead they told me where my 
problem was, and where in those manuals (chained to tables in the lab) 
to find more information.

It wasn't till my fourth language that I found out about local 
variables, and how a function should encapsulate one concept.  The first 
three didn't have such things.

>> Further, while you're developing, you should probably put the test data
>> into a literal (probably a multiline literal using triplequotes), so you can
>> experiment easily with changes to the data, and see how it results.
>>
>
>   #!/bin/python
>
> import os.path
>
> tokens=['B','E']
>
> for fileName in os.listdir("."):
>      if os.path.isfile(fileName) and os.path.splitext(fileName)[1]==".xpm":
>          filedata = open(fileName)
>          text=filedata.readlines()
>          results={}
>          numcolumns=len(text.strip())
>          for ch in tokens:
>              results[ch]=[0]*numcolumns
>          for line in text:
>              for col, ch in enumerate(line):
>                  if ch in tokens:
>                      results[ch][col]+=1
>          for item in results:
>                  print item
>
> $ python3 counter-vertically.py
>    File "counter-vertically.py", line 20
>      print item
>               ^
> SyntaxError: invalid syntax
>
As I said above, Python 3 needs parentheses around print's argument list.

As for splitting into functions, consider:

#these two are capitalized because they're intended to be constant
TOKENS = "BE"
LINESTOSKIP = 43
INFILEEXT = ".xpm"
OUTFILEEXT = ".txt"

def dofiles(topdirectory):
     for filename in os.listdr(topdirectory):
         processfile(filename)

def processfile(infilename):
     base, ext =os.path.splitext(fileName)
     if ext == INFILEEXT:
         text = fetchonefiledata(infilename)
         numcolumns = len(text[0])
         results = {}
         for ch in TOKENS:
             results[ch] = [0] * numcolumns
         for line in text:
             line = line.strip()
             for col, ch in enumerate(line):
                 if ch in tokens:
                     results[ch][col] += 1
         writeonefiledata(base+OUTFILEEXT, results)

def fetchonefiledata(inname):
     infile = open(inname)
     text = infile.readlines()
     return text[LINESTOSKIP:]

def writeonefiledata(outname):
     outfile = open(outname, "w")
     ...process the results as appropriate...
     ....(since you didn't tell us how multiple tokens were to be displayed)

if __name__ == "__main__":
     dofiles(".")     #or get the top directory from the sys.argv 
variable, which is set from command line.

Now this is totally untested.  I just typed it without even trying any 
of it.  But it gives you a simple refactoring that splits the logic so 
each can be visualized (and tested) independently.  i'd also split up 
processfile(), once I realized how big it was.

There are many shortcuts that can be applied. Some of them probably use 
language features you're not comfortable with, like perhaps generators.  
And if  efficiency is important, there are optimizations to do, like 
using islice directly on the infile object.  That one would eliminate 
having to have the whole file stored in memory at one time.

Likewise there are further things that could be done to decouple the 
functions even more.

But there's nothing in the above code which uses very advanced topics, 
so you should be able to understand it and fix whatever typos I've 
undoubtedly got.

What are you using for debugging aids?  Besides this group, I mean.  
print statements?  An IDE ?  which one?
  --

DaveA