Parse bug text file

Chris Angelico rosuav at gmail.com
Sun Jul 27 14:17:12 EDT 2014


On Mon, Jul 28, 2014 at 4:08 AM, CM <cmpython at gmail.com> wrote:
> I can go through it with opening the text file and reading in the lines, and if the first character is a "-" then count that as the start of a bug block, but I am not sure how to find the last line of a bug block...it would be the line before the first line of the next bug block, but not sure the best way to go about it.
>
> There must be a rather standard way to do something like this in Python, and I'm requesting pointers toward that standard way (or what this type of task is usually called).  Thanks.

This is a fairly standard sort of job, but there's not really a
ready-to-go bit of code. This is just straight-forward text
processing.

What I'd do is a stateful parser. Something like this:

block = None
with open("bugs.txt",encoding="utf-8") as f:
    for line in f:
        if line.startswith("- "):
            if block: save_to_database(block)
            block = line
        else:
            block += "\n" + line
if block: save_to_database(block) # don't forget to grab that last one!

This is extremely simple, and you might want to use a regex to look
for the upper-case word and date as well (this would falsely notice
any description line that happens to begin with a hyphen and a space).
But the basic idea is: initialize an accumulator to a null state;
whenever you find the beginning of something, emit the previous and
reset the accumulator; otherwise, add to the accumulator. At the end,
emit any current block.

ChrisA



More information about the Python-list mailing list