Parsing text

rzed jello at comic.com
Tue Dec 20 12:17:58 EST 2005


"sicvic" <morange.victor at gmail.com> wrote in
news:1135094799.384614.219560 at f14g2000cwb.googlegroups.com: 

> Not homework...not even in school (do any universities even
> teach classes using python?). Just not a programmer. Anyways I
> should probably be more clear about what I'm trying to do.
> 
> Since I cant show the actual output file lets say I had an
> output file that looked like this:
> 
> aaaaa bbbbb Person: Jimmy
> Current Location: Denver
> Next Location: Chicago
> ----------------------------------------------
> aaaaa bbbbb Person: Sarah
> Current Location: San Diego
> Next Location: Miami
> Next Location: New York
> ----------------------------------------------
> 
> Now I want to put (and all recurrences of "Person: Jimmy")
> 
> Person: Jimmy
> Current Location: Denver
> Next Location: Chicago
> 
> in a file called jimmy.txt
> 
> and the same for Sarah in sarah.txt
> 
> The code I currently have looks something like this:
> 
> import re
> import sys
> 
> person_jimmy = open('jimmy.txt', 'w') #creates jimmy.txt
> person_sarah = open('sarah.txt', 'w') #creates sarah.txt
> 
> f = open(sys.argv[1]) #opens output file
> #loop that goes through all lines and parses specified text
> for line in f.readlines():
>     if  re.search(r'Person: Jimmy', line):
>      person_jimmy.write(line)
>     elif re.search(r'Person: Sarah', line):
>      person_sarah.write(line)
> 
> #closes all files
> 
> person_jimmy.close()
> person_sarah.close()
> f.close()
> 
> However this only would produces output files that look like
> this: 
> 
> jimmy.txt:
> 
> aaaaa bbbbb Person: Jimmy
> 
> sarah.txt:
> 
> aaaaa bbbbb Person: Sarah
> 
> My question is what else do I need to add (such as an embedded
> loop where the if statements are?) so the files look like this
> 
> aaaaa bbbbb Person: Jimmy
> Current Location: Denver
> Next Location: Chicago
> 
> and
> 
> aaaaa bbbbb Person: Sarah
> Current Location: San Diego
> Next Location: Miami
> Next Location: New York
> 
> 
> Basically I need to add statements that after finding that line
> copy all the lines following it and stopping when it sees
> '----------------------------------------------'
> 
> Any help is greatly appreciated.
> 

Something like this, maybe?

"""
This iterates through a file, with subloops to handle the 
special cases. I'm assuming that Jimmy and Sarah are not the
only people of interest. I'm also assuming (for no very good
reason) that you do want the separator lines, but do not want 
the "Person:" lines in the output file. It is easy enough to 
adjust those assumptions to taste.

Each "Person:" line will cause a file to be opened (if it is 
not already open, and will write the subsequent lines to it 
until the separator is found. Be aware that all files remain 
open unitl the loop at the end closes them all.
"""

outfs = {}
f = open('shouldBeDatabase.txt')
for line in f:
    if line.find('Person:') >= 0:
        ofkey = line[line.find('Person:')+7:].strip()
        if not ofkey in outfs:
            outfs[ofkey] = open('%s.txt' % ofkey, 'w')
        outf = outfs[ofkey]
        while line.find('-----------------------------') < 0:
            line = f.next()
            outf.write('%s' % line)
f.close()
for k,v in outfs.items():
    v.close()

-- 
rzed



More information about the Python-list mailing list