newbie write to file question

ProvoWallis gshepherd281 at earthlink.net
Sat Dec 3 23:33:16 EST 2005


Hi,

I'm trying to create a script that will search an SGML file for the
numbers and titles of the hierarchical elements (section level
headings) and create a dictionary with the section number as the key
and the title as the value.

I've managed to make some progress but I'd like to get some general
feedback on my progress so far plus ask a question. When I run this
script on a directory that contains multiple files even the files that
don't contain any matches generate log files and usually with the
contents of the last file that contained matches. I'm not sure what I'm
missing so I'd appreciate some advice.

Thanks,

Greg


Here's a very simplified version of my SGML:

<sec-main no="1.01"><title>section title 1.01
<sec-sub1 no="1"><title>title 1
<sec-sub1 no="2"><title>title 2
<sec-sub2 no="a"><title>title a
<sec-sub2 no="b"><title>title b
<sec-sub3 no="i"><title>title i
<sec-main no="2.02"><title>section title 2.02
<sec-main no="3.03"><title>section title 3.03
<sec-sub1 no="1"><title>title 1
<sec-sub1 no="2"><title>title 2
<sec-main no="4.04"><title>section title 4.04
<sec-main no="5.05"><title>section title 5.05

And here's what I written so far:

import os
import re

setpath = raw_input("Enter the path where the program should run: ")
print

table ={}

for root, dirs, files in os.walk(setpath):
     fname = files
     for fname in files:
          inputFile = file(os.path.join(root,fname), 'r')


          while 1:
               lines = inputFile.readlines(10000)
               if not lines:
                    break
               for line in lines:
                    main = re.search(r'(?i)<sec-main
no=\"(\d+\.\d\d)\">\n?<title>(.*?)\n' , line)
                    sub_one = re.search(r'(?i)<sec-sub1
no=\"(\w*)\">\n?<title>(.*?)\n' , line)
                    sub_two = re.search(r'(?i)<sec-sub2
no=\"(\w*)\">\n?<title>(.*?)\n' , line)
                    sub_three = re.search(r'(?i)<sec-sub3
no=\"(\w*)\">\n?<title>(.*?)\n' , line)
                    if main is not None:
                         table[main.group(1)] = main.group(2)
                         m = main.group(1)
                    if main is None:
                         pass
                    if sub_one is not None:
                         one = m + '[' + sub_one.group(1) + ']'
                         table[one] = sub_one.group(2)
                    if sub_one is None:
                         pass
                    if sub_two is not None:
                         two = one + '[' + sub_two.group(1) + ']'
                         table[two] = sub_two.group(2)
                    if sub_two is None:
                         pass
                    if sub_three is not None:
                         three = two + '[' + sub_three.group(1) + ']'
                         table[three] = sub_three.group(2)
                    if sub_three is None:
                         pass

                         str_table = str(table)
                         (name,ext) = os.path.splitext(fname)
                         output_name = name + '.log'
                         outputFile =
file(os.path.join(root,output_name), 'w')
                         outputFile.write(str_table)
                         outputFile.close()




More information about the Python-list mailing list