iterating over a file with two pointers

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Sep 18 22:56:04 EDT 2013


On Wed, 18 Sep 2013 04:12:05 -0700, nikhil Pandey wrote:

> hi,
> I want to iterate over the lines of a file and when i find certain
> lines, i need another loop starting from the next of that "CERTAIN" line
> till a few (say 20) lines later. so, basically i need two pointers to
> lines (one for outer loop(for each line in file)) and one for inner
> loop. How can i do that in python? please help. I am stuck up on this.

No, you don't "need" two pointers to lines. That is just one way to solve 
this problem. You can solve it many ways.

One way, for small files (say, under one million lines), is to read the 
whole file into a list, then have two pointers to a line:

lines = file.readlines()
p = q = 0

while p < len(lines):
    print(lines[p])
    p += 1


then advance the pointers p and q as needed. This is the most flexible 
way to do it: you can have as many pointers as needed, you can back-
track, jump forward, jump back, and it is all high-speed random-access 
memory accesses. Except for the initial readlines, none of it is slow I/O 
processing.


Another solution is to use a state-machine:


for line in somefile:
    if state == SCANNING:
        do_something()
    elif state == PROCESSING:
        do_something_else()
    elif state == WOBBLING:
        wobble()
    state = adjust_state(line)


You can combine the two, of course, and have a state machine with 
multiple pointers to a list of lines.

Using itertools.tee, you can potentially combine these solutions with the 
straightforward for-loop over a list. The danger of itertools.tee is that 
it may use as much memory as reading the entire file into memory at once, 
but the benefit is that it may use much less. But personally, I find list-
based processing with random-access by index much easier to understand 
that itertools.tee solutions.



-- 
Steven



More information about the Python-list mailing list