[Expat-bugs] [ expat-Bugs-683681 ] XML_GetCurrent* functions for doctype declaration/DTD events

SourceForge.net noreply at sourceforge.net
Sun Feb 9 17:12:23 EST 2003


Bugs item #683681, was opened at 2003-02-10 01:12
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=683681&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Rolf Ade (pointsman)
Assigned to: Nobody/Anonymous (nobody)
Summary: XML_GetCurrent* functions for doctype declaration/DTD events

Initial Comment:
I find (1.95.6)  the return values of the
XML_GetCurrent* functions, if called in a doctype
declaration or DTD event handler
(XML_StartDoctypeDeclHandler,
XML_EndDoctypeDeclHandler, XML_ElementDeclHandler
etc.), surprising and at least under documented.

The reference.html file is a bit spare about the
XML_GetCurrent* functions. For example, the
documentation of XML_GetCurrentLineNumber() says only:
"Return the line number of the position." What exactly
is 'the postion', if the function is called in a event
handler?

The comments in the expat.h file are more explicit.
Especially they
mention:

   They may be called from any callback called to
report some parse
   event; in this case the location is the location of
the first of the
   sequence of characters that generated the event.


Now consider for example the following simple xml data:

<!DOCTYPE test SYSTEM "file:///boo.baz"     [
   <!ELEMENT test EMPTY>
   <!ATTLIST test attr CDATA #IMPLIED>
]>
<test attr="value"/>

A simple demo program, which calls all the
XML_GetCurrent* functions in the
XML_StartDoctypeDeclHandler(),
XML_EndDoctypeDeclHandler(), XML_ElementDeclHandler()
and XML_AttlistDeclHandler() gives the following output:

doctypeStart: line 1 column 44 index  44 count  1
elementDecl:  line 2 column 18 index  64 count  0
attlistDecl:  line 3 column 29 index 100 count  0
doctypeEnd:   line 4 column  1 index 111 count  1
elementStart: line 5 column  0 index 113 count 20

If called in an elementStart handler, the
XML_GetCurrent* functions return sensible values. Line
5 column 0 is the opening "<" of that tag, as the
comment in expat.h says, and the complete markup
reported is 20 characters long. Very fine.

If called in the doctype declaration start handler,
element declaration handler or attlist declaration
handler, the results getting stranger. The position,
reported by the XML_GetCurrentLine/ColumnNumber is
somewhere inside the reported markup and the results of
XML_GetCurrentByteCount looks really somewhat wired. At
least, the result of XML_GetCurrentByteIndex points
always to the same char as XML_GetCurrentLine/ColumnNumber.

The current behavior seems to allow me, to do what I
want (preserve the internal subset as found in the
original XML data, with copying the parts of the input
streams as indicated by XML_GetCurrent* function calls
in the doctype declaration start/end handler) but all
in all, this behavior isn't really considered to be
stable or 'the right one' and for sure, it's not
documented, so that one could bank on it.

rolf


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=683681&group_id=10127



More information about the Expat-bugs mailing list