Newbie Text Processing Question

Gregory Piñero gregpinero at gmail.com
Tue Oct 4 23:13:52 EDT 2005


That's how Python works. You read in the whole file, edit it, and write it
back out. As far as I know there's no way to edit a file "in place" which
I'm assuming is what you're asking?

And now, cue the responses telling you to use a fancy parser (XML?) for your
project ;-)

-Greg


On 4 Oct 2005 20:04:39 -0700, gshepherd281 at earthlink.net <
gshepherd281 at earthlink.net> wrote:
>
> Hi,
>
> I'm a total newbie to Python so any and all advice is greatly
> appreciated.
>
> I'm trying to use regular expressions to process text in an SGML file
> but only in one section.
>
> So the input would look like this:
>
> <ch-part no="I"><title>RESEARCH GUIDE
> <sec-main no="1.01"><title>content
> <para>content
>
> <sec-main no="2.01"><title>content
> <para>content
>
>
> <ch-part no="II"><title>FORMS
> <sec-main no="3.01"><title>content
>
> <sec-sub1 no="1"><title>content
> <para>content
>
> <sec-sub2 no="1"><title>content
> <para>content
>
>
> and the output like this:
>
> <ch-part no="I"><title>RESEARCH GUIDE
> <sec-main no="1.01"><title>content
> <biblio>
> <para>content
> </biblio>
>
> <sec-main no="2.01"><title>content
> <biblio>
> <para>content
> </biblio>
>
> <ch-part no="II"><title>FORMS
> <sec-main no="3.01"><title>content
>
> <sec-sub1 no="1"><title>content
> <para>content
>
> <sec-sub2 no="1"><title>content
> <para>content
>
>
> But no matter what I try I end up changing the entire file rather than
> just one part.
>
> Here's what I've come up with so far but I can't think of anything
> else.
>
> ***
>
> import os, re
> setpath = raw_input("Enter the path where the program should run: ")
> print
>
> for root, dirs, files in os.walk(setpath):
> fname = files
> for fname in files:
> inputFile = file(os.path.join(root,fname), 'r')
> line = inputFile.read()
> inputFile.close()
>
>
> chpart_pattern = re.compile(r'<ch-part
> no=\"[A-Z]{1,4}\"><title>(RESEARCH)', re.IGNORECASE)
>
> while 1:
> if chpart_pattern.search(line):
> line = re.sub(r"<sec-main
> no=(\"[0-9]*.[0-9]*\")><title>(.*)", r"<sec-main
> no=\1><title>\2\n<biblio>", line)
> outputFile = file(os.path.join(root,fname), 'w')
> outputFile.write(line)
> outputFile.close()
> break
>
> if chpart_pattern.search(line) is None:
> print 'none'
> break
>
> Thanks,
>
> Greg
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



--
Gregory Piñero
Chief Innovation Officer
Blended Technologies
(www.blendedtechnologies.com <http://www.blendedtechnologies.com>)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20051004/0203899e/attachment.html>


More information about the Python-list mailing list