eof

Duncan Booth duncan.booth at invalid.invalid
Thu Nov 22 07:26:12 EST 2007


braver <deliverable at gmail.com> wrote:

> In many cases, you want to do this:
> 
> for line in f:
>     <do something with the line, setup counts and things>
>     if line % 1000 == 0 or f.eof(): # eof() doesn't exist in Python
> yet!
>         <use the setup variables and things to process the chunk>
> 
> My control logic summarizes every 1000 lines of a file.  I have to
> issue the summary after each 1000 lines, or whatever incomplete tail
> chunk remains.  If I do it after the for loop, I have to refactor my
> logic into a procedure to call it twice.  Now I want to avoid the
> overhead of the procedure call, and generally for a script to keep it
> simple. 

This sounds like a case for writing a generator. Try this one:

----- begin chunks.py -------
import itertools
def chunks(f, size):
    iterator = iter(f)
    def onechunk(line):
        yield line
        for line in itertools.islice(iterator, size-1):
            yield line
    for line in iterator:
        yield onechunk(line)

for chunk in chunks(open('chunks.py'), 3):
    for n, line in enumerate(chunk):
        print "%d:%s" % (n,line.rstrip())
    print "---------------"
print "done"
#eof
------ end chunks.py --------

Ths output when you run this is:

C:\Temp>chunks.py
0:import itertools
1:def chunks(f, size):
2:    iterator = iter(f)
---------------
0:    def onechunk(line):
1:        yield line
2:        for line in itertools.islice(iterator, size-1):
---------------
0:            yield line
1:    for line in iterator:
2:        yield onechunk(line)
---------------
0:
1:for chunk in chunks(open('chunks.py'), 3):
2:    for n, line in enumerate(chunk):
---------------
0:        print "%d:%s" % (n,line.rstrip())
1:    print "---------------"
2:print "done"
---------------
0:#eof
---------------
done

Or change it to do:

   for chunk in chunks(enumerate(open('chunks.py')), 3):
       for n, line in chunk:

and you get all lines numbered from 0 to 15 instead of resetting the 
count each chunk.



More information about the Python-list mailing list