[Tutor] a quick Q: how to use for loop to read a series of files with .doc end

lina lina.lastname at gmail.com
Wed Oct 5 14:46:28 CEST 2011


On Wed, Oct 5, 2011 at 8:21 PM, Dave Angel <d at davea.name> wrote:

>
>
>>
>>> #these two are capitalized because they're intended to be constant
>>> TOKENS = "BE"
>>> LINESTOSKIP = 43
>>> INFILEEXT = ".xpm"
>>> OUTFILEEXT = ".txt"
>>>
>>> def dofiles(topdirectory):
>>>    for filename in os.listdr(topdirectory):
>>>
>> Here your typo is listdir not listdr,

>        processfile(filename)
>>>
>>> def processfile(infilename):
>>>    base, ext =os.path.splitext(fileName)
>>>
>> Here I changed the fileName to infilename

>    if ext == INFILEEXT:
>>>        text = fetchonefiledata(infilename)
>>>        numcolumns = len(text[0])
>>>        results = {}
>>>        for ch in TOKENS:
>>>
>>>            results[ch] = [0] * numcolumns
>>>        for line in text:
>>>            line = line.strip()
>>>
>>>            for col, ch in enumerate(line):
>>>                if ch in tokens:
>>>
>> Here I changed the tokens to TOKENS

>                    results[ch][col] += 1
>>>        writeonefiledata(base+**OUTFILEEXT, results)
>>>
>>>
>>> def fetchonefiledata(inname):
>>>    infile = open(inname)
>>>    text = infile.readlines()
>>>    return text[LINESTOSKIP:]
>>>
>>> def writeonefiledata(outname):
>>>    outfile = open(outname, "w")
>>>    ...process the results as appropriate...
>>>    ....(since you didn't tell us how multiple tokens were to be
>>> displayed)
>>>
>>> if __name__ == "__main__":
>>>    dofiles(".")     #or get the top directory from the sys.argv variable,
>>> which is set from command line.
>>>
>>>
>>> You dissect the former one you suggested before into 4 functions.
>>>
>>
>> a little question, why choose .ext? why the splitext is also ext here?
>>
>>
>>
>>  Try the following, perhaps in the interpreter:
>
> mytuple = ("one thing", "Another thing")
> base, extension = mytuple
>
> Now look and see what base and extension have for values.
>
> Previously we just needed the second element of the splitext return value.
>  This time we'll need both, so might as well put them in variables that have
>  useful names.

Yes, thanks for reminding, I understand now.

>
>
>
>>
>> import os.path
>>
>>
>> TOKENS="E"
>> LINESTOSKIP=0
>> INFILEEXT=".xpm"
>> OUTFILEEXT=".txt"
>>
>> def dofiles(topdirectory):
>>     for filename in os.listdir(topdirectory):
>>         processfile(filename)
>>
>> def processfile(infilename):
>>     base, ext =os.path.splitext(infilename)
>>     if ext == INFILEEXT:
>>         text = fetchonefiledata(infilename)
>>         numcolumns=len(text[0])
>>         results={}
>>         for ch in TOKENS:
>>
>>             results[ch] = [0]*numcolumns
>>         for line in text:
>>             line = line.strip()
>>
>>             for col, ch in enumerate(line):
>>                 if ch in TOKENS:
>>                     results[ch][col]+=1
>>         writeonefiledata(base+**OUTFILEEXT,results)
>>
>> def fetchonefiledata(inname):
>>     infile = open(inname)
>>     text = infile.readlines()
>>     return text[LINESTOSKIP:]
>>
>> def writeonefiledata(outname,**results):
>>     outfile = open(outname,"w")
>>     for item in results:
>>         return outfile.write(item)
>>
>>
>> if __name__=="__main__":
>>     dofiles(".")
>>
>> just the results is a bit unexpected.
>>
>>  $ more try.txt
>> E
>>
>> I might make a mistake in the writeonefiledata your left part.
>>
>>  I'd be amazed if there weren't at least a couple of typos in my message.
>  But this is where you sprinkle a couple of prints.  What did results look
> like when you print it out?
>
Yes, you did keep some typos there.
The result is kind of weird? only E there.

def writeonefiledata(outname,results):
    outfile = open(outname,"w")
    for item in results:
        return outfile.write(item)

This final part I made some mistakes?

>
> I hope you'll find that results is a dictionary, you might not want to just
> write() its keys.  You probably want to write() its values instead, perhaps
> with a heading showing what key you're printing.

Later I wish to get the value of B+E, the two tokens. so the final results
of each columns is enough. I will use this data to proceed further in
future.

>
>
>  But it gives you a simple refactoring that splits the logic so each can be
>>
>>> visualized (and tested) independently.  i'd also split up processfile(),
>>> once I realized how big it was.
>>>
>>> There are many shortcuts that can be applied. Some of them probably use
>>> language features you're not comfortable with, like perhaps generators.
>>>  And
>>> if  efficiency is important, there are optimizations to do, like using
>>> islice directly on the infile object.  That one would eliminate having to
>>> have the whole file stored in memory at one time.
>>>
>>> Likewise there are further things that could be done to decouple the
>>> functions even more.
>>>
>>> But there's nothing in the above code which uses very advanced topics, so
>>> you should be able to understand it and fix whatever typos I've
>>> undoubtedly
>>> got.
>>>
>>> What are you using for debugging aids?  Besides this group, I mean.
>>>  print
>>> statements?  An IDE ?  which one?
>>>
>>>  debugging aids?
>> I just run python3 script.py
>> it will pop up some hints,
>> in the middle, probably try print.
>>
>>  Once the code is refactored into small enough independent functions, you
> can do things like write multiple versions of a given function, for
> debugging purposes.  For example, you could have another function called
>  fetchonefiledata(), and have it return a list of strings.  For example, it
> might be
>
> def fetchonefiledata(dummy):
>    buf = """EEDC
> AAAC
> F145
> CCCA
> """
>    return buf.split()
>
> and then you wouldn't be dependent on an actual file being available.
>
> Naturally, at that point, your top-level code would call processfiles()
> instead of dofile().
>
> And remember the repr() and type() functions when trying to see just what
> type of thing something is.y
>
I have not figured it out how to use the repr() and type() yet.
another question, you know in linux, when use TAB, can automatically input
something,
so in python3, are there some way they can intelligent give some hints or
fill the left.

Thanks again,

> .
>
>> Thanks for your time,
>>
>>
> You're certainly welcome.
>
>
>
> --
>
> DaveA
>
>


-- 
Best Regards,

lina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111005/320b4b3f/attachment-0001.html>


More information about the Tutor mailing list