[XML-SIG] Pulldom example

Paul Prescod paul@prescod.net
Wed, 21 Jun 2000 12:54:14 -0500


"""This code shows how to use pulldom with a very simple, "hand-coded"
dispatcher and a few helper functions to do an X->Y translation.

It is deliberately coded in a manner that is not as intelligent as it
could be because I want to emphasize that there is no rocket science.
It's simple enough to use in a simple manner and can easily be ramped
up to something more advanced with sophisticated dispatchers (which any
sophisticated Python programmer could write in half an hour or so)."""

import pulldom

paper= \
"""<gcapaper>
<front>
<title>From Markup To Object Model</title>
<subt>The XML Abstraction Problem and XML Property Objects</subt>
<author>
<fname>Paul</fname><surname>Prescod</surname>
<jobtitle>Consulting Engineer</jobtitle>
<address><affil>ISOGEN/DataChannel</affil>
<aline>2200 North Lamar</aline>
<city>Dallas</city><state>Texas</state><cntry>USA</cntry><postcode>75202</postcode>
<phone>214 953 0004</phone><fax>214 953 3152</fax>
<email>paul@isogen.com</email>
<web>www.isogen.com</web>
</address>
<bio>
<para><highlight style="bold">Paul Prescod</highlight> 
- Paul Prescod is a leading researcher and implementor of markup
  technologies.  His formal education was in mathematics and computer
science at the University of Waterloo. His research interests include
formalisms for document modeling, queries and schemata.  As a consulting
engineer at ISOGEN, he helps organizations apply ISO and W3C standards
to large-scale documentation problems.</para>
</bio>
</author>
<abstract>
<para>Mechanisms for building abstractions over XML documents tend to be
more complex and less flexible than techniques available in domains such
as relational databases and object models. This paper reviews several
existing strategies and suggests a new one. XML Property Objects allow a
flexible, user-defined mapping from complex XML attributed element tree
structures to directed labeled graph structures.</para>
</abstract>
</front>
<body>
<section>
<title>Overview</title>
<para>Software engineering is dominated by two tasks. The first is the
design of algorithms (and necessary data structures) required to
automate the solutions to particular problems. The second is the design
of abstractions. Abstractions allow us to reuse software code and thus
make software solutions that can grow and be maintained over
time.</para>

<para>In a world where only implementation and algorithms mattered,
everything could be programmed in assembly language and every project
would be approached as if from scratch. There would be no operating
systems, no programming languages, no code libraries and no relational
databases.  A programmer's job would be analogous to that of a
carpenter. The fact that a carpenter has hammered a nail a thousand
times before does not remove the requirement to do it again.  Reuse is
at the level of ideas and skills, not implementation. </para>

<para>To some extent, creators of tiny "embedded systems" live in this
world.  Thankfully, the rest of us can use the ever-expanding RAM in our
computers to build abstractions on top of abstractions on top of
abstractions: programs on top of programming languages on top of
interpreters on top of other programming languages on top of operating
systems. Each level can itself be decomposed into many abstractions. The
popular UML diagramming standard exists precisely to help manage these
levels of abstraction.</para>

<para>...</para>
</section>
</body>
</gcapaper>
"""

events=pulldom.parseString( paper )

def doit():
    for token, node in events:
        if matchStart( "gcapaper", token, node ):
            print "<html>"
            
        elif matchEnd( "gcapaper", token, node ):
            print "</html>"
    
        elif matchTextIn( ("title", "gcapaper"), token, node ):
            print "<title>%s</title>" % node.data
    
        elif matchEnd( "author", token, node ):
            print "<p>By: %s %s" % ( firstname, lastname )
    
        elif matchTextIn( "fname", token, node ):
            firstname=node.data +" "
    
        elif matchTextIn( "surname", token, node ):
            lastname=node.data
    
        elif matchTextIn( "jobtitle", token, node ):
            print "<p>Job Title: %s</p>" % node.data
    
        elif matchTextIn( "affil", token, node ):
            affil=node.data
    
        elif matchTextIn( "aline", token, node ):
            aline=node.data
    
        elif matchTextIn( "city", token, node ):
            city=node.data
    
        elif matchTextIn( "state", token, node ):
            state=node.data
    
        elif matchTextIn( "cntry", token, node ):
            cntry=node.data
    
        elif matchTextIn( "postcode", token, node ):
            postcode=node.data
    
        elif matchTextIn( "phone", token, node ):
            phone=node.data
    
        elif matchTextIn( "fax", token, node ):
            fax=node.data
    
        elif matchTextIn( "email", token, node ):
            email=node.data
    
        elif matchTextIn( "web", token, node ):
            web=node.data
    
        elif matchEnd( "address", token, node ):
            print "<address>"
            print "<p>%s</p>"%affil
            print "<p>%s</p>"%aline
            print "<p>%s, %s</p>"% (city, state)
            print "<p>%s</p>"%postcode
            print "<p>Phone: %s</p>"%phone
            print "<p>Fax: %s</p>"%fax
            print "<p>Email: %s</p>"%email
            print "<p>Web: %s</p>"%web
            print "</address>"
    
        elif matchStart( "para", token, node ):
            print "<p>"
    
        elif matchEnd( "para", token, node ):
            print "</p>"
    
        elif matchStart( "highlight", token, node ):
            print "<b>"
    
        elif matchEnd( "highlight", token, node ):
            print "</b>"
    
        elif matchStart( "bio", token, node ):
            print "<blockquote>"
    
        elif matchEnd( "bio", token, node ):
            print "</blockquote>"
    
        elif matchStart( "abstract", token, node ):
            print "<blockquote>"
    
        elif matchEnd( "abstract", token, node ):
            print "</blockquote>"

        # I could have counted on the way down,
        #      but I want to show code that walks up the tree
        elif matchTextIn( ("title", "section" ), token, node ): 
            level=0
            titleNode=node.parentNode
            sectionNode=node.parentNode
            while sectionNode.tagName=="section":
                level=level+1
                sectionNode=sectionNode.parentNode
            outtag="h"+`level`
            print "<%s>%s</%s>" % (outtag, node.data, outtag )
        elif token==pulldom.CHARACTERS:
            if( node.data.strip()):
                print node.data
        else: 
            pass


# a few simple helper functions
def matchStart( tagName, token, node ):
    return token==pulldom.START_ELEMENT and node.tagName==tagName

def matchEnd( tagName, token, node ):
    return token==pulldom.END_ELEMENT and node.tagName==tagName

def matchTextIn( tagName, token, node ):
    if type( tagName )==type( "" ):
        return token==pulldom.CHARACTERS and
node.parentNode.tagName==tagName
    elif type( tagName ) == type((1,)):
        return token==pulldom.CHARACTERS and \
           matchTagContext( tagName, node.parentNode )

def matchTagContext( tagNames, node ):
    this,rest=tagNames[0], tagNames[1:]
    imatch = node.tagName==this  
    if not rest:
        return imatch
    else:
        return imatch and matchTagContext( rest, node.parentNode )

doit()
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Music is the stuff between the notes." - Claude Debussy