File parser
Rune Strand
rune.strand at gmail.com
Mon Aug 29 22:20:27 EDT 2005
It's not clear to me from your posting what possible order the tags may
be inn. Assuming you will always END a section before beginning an new,
eg.
it's always:
A
some A-section lines.
END A
B
some B-section lines.
END B
etc.
And never:
A
some A-section lines.
B
some B-section lines.
END B
END A
etc.
is should be fairly simple. And if the file is several GB, your ought
to use a generator in order to overcome the memory problem.
Something like this:
def make_tag_lookup(begin_tags):
# create a dict with each {begin_tag : end_tag}
end_tags = [('END ' + begin_tag) for begin_tag in begin_tags]
return dict(zip(begin_tags, end_tags))
def return_sections(filepath, lookup):
# Generator returning each section
inside_section = False
for line in open(filepath, 'r').readlines():
line = line.strip()
if not inside_section:
if line in lookup:
inside_section = True
data_section = []
section_end_tag = lookup[line]
section_begin_tag = line
data_section.append(line) # store section start tag
else:
if line == section_end_tag:
data_section.append(line) # store section end tag
inside_section = False
yield data_section # yield entire section
else:
data_section.append(line) #store each line within section
# create the generator yielding each section
#
sections = return_sections(datafile,
make_tag_lookup(list_of_begin_tags))
for section in sections:
for line in section:
print line
print '\n'
More information about the Python-list
mailing list