[XML-SIG] Parsing DTDs

Alexandre Fayolle Alexandre.Fayolle@logilab.fr
Mon, 12 Feb 2001 12:15:05 +0100 (CET)


On Mon, 12 Feb 2001, Radestock, Guenter wrote:

> 2. There is a DTD parser inside xmlproc.  This seems to be pretty closely
> coupled to the validating XML parser.  At first sight it looks like it
> gets very low level DTD events and generates finite state automata
> objects among other things used to validate XML later on.  It looks
> like there is no intermediate representation of the DTD that can (or should)
> be used for other purposes than validating XML.  Is this correct?  Have
> I looked at the wrong piece of code (i.e. is there something in the
> 4suite package I could use?

You can access a DTD object that gets generated from the parsing. The
following sample code comes from the xmltools utility set that uses the
DTD information to generate contextual menus for an XML editor. There is
extensive API documentation on Lars Marius Garshol's page
(http://www.garshol.priv.no/download/software/xmlproc/)


-------------------------8<-------------------------------------
from xml.parsers.xmlproc.dtdparser import DTDParser
from xml.parsers.xmlproc.xmldtd import CompleteDTD

def parse_dtd_file(dtd_file,dtd_obj=None):
    parser = DTDParser()
    dtd = dtd_obj or CompleteDTD(parser)
    parser.set_dtd_consumer(dtd)
    parser.set_dtd_object(dtd)
    parser.parse_resource(dtd_file)
    parser.deref()
    return dtd

def getElementsName(child,dtd,list=None):
    """
    A recursive function that permits to extract allowed elements name
from
    the complex output tuple of ElementType.get_content_model (something
like
     (',', [('caption', '?'), ('|', [('col', '*'), ('colgroup', '*')],
''),
    ('thead', '?'), ('tfoot', '?'), ('|', [('tbody', '+'), ('tr', '+')],
'')],
    '') : example of the allowed elements of the HTML tag <table>)
    Inputs the complex tuple to be processed.
    Inputs the dtd object from which the elements have been read
    Inputs the list in which will be stored the elements name
    Returns the list
    """
    templist = list or []
    # processes the case of child == None (occurs when element content
    # is specified to be ANY)
    if (child == None) :
        # the return list is set to all of the elements declared in the
        # DTD
        templist = dtd.get_elements()
    else :
        # if the penultimate element of the complex tuple is a list,
        # then we have to recursively process each element of the list.
        if type(child[-2])==type([]):
            for c in child[-2]:
                templist =  getElementsName(c,dtd,templist)
        # if the penultimate element of the complex tuple is a tuple,
        # then we have to recursively process this last tuple.
        elif type(child[-2])==type(()):
            templist = getElementsName(child[-2],dtd,templist)
        # else the penultimate element of the complex tuple is a string
        # containing an allowed element name. We just have to append it
        # the return list.
        else:
            templist.append(child[-2])
    return templist

------------------------------8<----------------------------------------


Alexandre Fayolle
-- 
http://www.logilab.com 
Narval is the first software agent available as free software (GPL).
LOGILAB, Paris (France).