[XML-SIG] ElementType.content_model interpretation of '*'

Jeffrey Chang Jeffrey Chang <jefftc@leland.Stanford.EDU>
Mon, 19 Apr 1999 16:50:55 -0700 (PDT)


I am using xmlproc.dtdparser.DTDParser and xmlproc.xmldtd.CompleteDTD to
parse and store the contents of a DTD file (xmlproc v0.60).  I have a
question about the the interpretation of the contents within a DTD
element. 

I load a DTD definition into a variable 'd'.  The definition contains an
element:
<!ELEMENT test (a,b*)>

Then, when I look at the content model of test:
>>> d.elems['test'].content_model
{                            # I've reformatted this for readability
'start': 1L, 
     1L: [(6L, 'a')],
     4L: [(4L, 'b')], 
     6L: [(4L, 'b')], 
'final': 4L 
}

According to this content model, 'test' must contain 1 'a' and at least 1
'b' before reaching the final state.  I believe the 'b' should be
optional, and would have expected a content model more like:
'start': 1L, 
     1L: [(4L, 'a')],
     4L: [(4L, 'b')],
'final': 4L 


I also tested this with the following element:
<!ELEMENT test (a,b+)>

In this case, I get a content model that looks reasonable:
{
'start': 1L
     1L: [(2L, 'a')], 
     2L: [(4L, 'b')], 
     4L: [(4L, 'b')], 
'final': 4L, 
}


Please let me know if my interpretation of the XML specs, or the
content_model data structure is incorrect.

BTW, Lars, thanks very much for xmlproc!  It is much-needed tool. 

Jeff