SaxRecords.py (was Re: busting-out XML sections)
Andrew Dalke
dalke at acm.org
Tue Oct 10 02:30:44 EDT 2000
Thomas Gagne wrote:
>I think what I'm beginning to picture inside my head is a combination
SAX/DOM
>parser. Imagine how useful this would be for both large files and realtime
>data. SAX would read the (unending) stream of data and my document handler
>would watch for the start and end tags of the useful subsections. When the
>end-tag is reached it would somehow take the inbetween data and hand it off
to
>a DOM parser where the individual transactions are taken care of.
Interestingly enough, I've been thinking about what I think is a similar
thing, especially since it should help simplify my Martel work (see
biopython.org/~dalke/Martel/). I wrote up a first draft of the module and
made it available at http://www.biopython.org/~dalke/SaxRecords.py . Here's
what it looks like to use it:
import SaxRecords
from xml.sax import saxexts
from xml.dom import sax_builder
from StringIO import StringIO
parser = saxexts.make_parser()
test_data = """<doc>
<record><f>Andrew</f><l>Dalke</l><city>Santa Fe</city></record>
<record><f>Bill</f><l>Clinton</l><city>Washington</city></record>
<record><f>Craig</f><l>Vance</l><city>New York</city></record>
</doc>"""
record_parser = SaxRecords.Parser(parser, "record",
sax_builder.SaxBuilder)
for builder in record_parser.parseFile(StringIO(test_data)):
doc = builder.document
... work with the DOM document ...
As you might see, I turned the interface into forward iterator by spawning
off a thread to handle the callbacks and send them back to the original
thread.
The package includes a slightly modified version of Sean McGrath's RAX
Record object as an alternate to producing DOM documents.
Also, it seems you'll have to tweak it a bit to work with PyXML-0.6.1, but
the basic concept should be viable.
Andrew Dalke
dalke at acm.org
More information about the Python-list
mailing list