Simple text parsing gets difficult when line continues to next line

John Machin sjmachin at lexicon.net
Tue Nov 28 14:48:07 EST 2006


Jacob Rael wrote:
> Hello,
>
> I have a simple script to parse a text file (a visual basic program)
> and convert key parts to tcl. Since I am only working on specific
> sections and I need it quick, I decided not to learn/try a full blown
> parsing module. My simple script works well until it runs into
> functions that straddle multiple lines. For example:
>
>   Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
> &H8, &H0, _
>                 &H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
> &H0, &HF, &H0, -1)
>
>
> I read in each line with:
>
> for line in open(fileName).readlines():
>
> I would line to identify if a line continues (if line.endswith('_'))
> and concate with the next line:
>
> line = line + nextLine
>
> How can I get the next line when I am in a for loop using readlines?

Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

def continue_join(linesin):
    linesout = []
    buff = ""
    NORMAL = 0
    PENDING = 1
    state = NORMAL
    for line in linesin:
        line = line.rstrip()
        if state == NORMAL:
            if line.endswith('_'):
                buff = line[:-1]
                state = PENDING
            else:
                linesout.append(line)
        else:
            if line.endswith('_'):
                buff += line[:-1]
            else:
                buff += line
                linesout.append(buff)
                buff = ""
                state = NORMAL
    if state == PENDING:
        raise ValueError("last line is continued: %r" % line)
    return linesout

import sys
fp = open(sys.argv[1])
rawlines = fp.readlines()
cleanlines = continue_join(rawlines)
for line in cleanlines:
    print repr(line)
===
Tested with following files:
C:\junk>type contlinet1.txt
only one line

C:\junk>type contlinet2.txt
line 1
line 2

C:\junk>type contlinet3.txt
line 1
line 2a _
line 2b _
line 2c
line 3

C:\junk>type contlinet4.txt
line 1
_
_
line 2c
line 3

C:\junk>type contlinet5.txt
line 1
_
_
line 2c
line 3 _

C:\junk>

HTH,
John




More information about the Python-list mailing list