Simple text parsing gets difficult when line continues to next line

John Machin sjmachin at lexicon.net
Tue Nov 28 16:29:16 EST 2006


Tim Hochberg wrote:
[snip]
> I agree that mixing the line assembly and parsing is probably a mistake
> although using next explicitly is fine as long as your careful with it.
> For instance, I would be wary to use the mixed for-loop, next strategy
> that some of the previous posts suggested. Here's a different,
> generator-based implementation of the same idea that, for better or for
> worse is considerably less verbose:
>
[snip]

Here's a somewhat less verbose version of the state machine gadget.

def continue_join_3(linesin):
    linesout = []
    buff = ""
    pending = 0
    for line in linesin:
        # remove *all* trailing whitespace
        line = line.rstrip()
        if line.endswith('_'):
            buff += line[:-1]
            pending = 1
        else:
            linesout.append(buff + line)
            buff = ""
            pending = 0
    if pending:
        raise ValueError("last line is continued: %r" % line)
    return linesout

FWIW, it works all the way back to Python 2.1

Cheers,
John,




More information about the Python-list mailing list