[Expat-bugs]
[ expat-Bugs-683681 ] XML_GetCurrent* functions for doctype
declaration/DTD events
SourceForge.net
noreply at sourceforge.net
Sun Feb 9 17:12:23 EST 2003
Bugs item #683681, was opened at 2003-02-10 01:12
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=683681&group_id=10127
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Rolf Ade (pointsman)
Assigned to: Nobody/Anonymous (nobody)
Summary: XML_GetCurrent* functions for doctype declaration/DTD events
Initial Comment:
I find (1.95.6) the return values of the
XML_GetCurrent* functions, if called in a doctype
declaration or DTD event handler
(XML_StartDoctypeDeclHandler,
XML_EndDoctypeDeclHandler, XML_ElementDeclHandler
etc.), surprising and at least under documented.
The reference.html file is a bit spare about the
XML_GetCurrent* functions. For example, the
documentation of XML_GetCurrentLineNumber() says only:
"Return the line number of the position." What exactly
is 'the postion', if the function is called in a event
handler?
The comments in the expat.h file are more explicit.
Especially they
mention:
They may be called from any callback called to
report some parse
event; in this case the location is the location of
the first of the
sequence of characters that generated the event.
Now consider for example the following simple xml data:
<!DOCTYPE test SYSTEM "file:///boo.baz" [
<!ELEMENT test EMPTY>
<!ATTLIST test attr CDATA #IMPLIED>
]>
<test attr="value"/>
A simple demo program, which calls all the
XML_GetCurrent* functions in the
XML_StartDoctypeDeclHandler(),
XML_EndDoctypeDeclHandler(), XML_ElementDeclHandler()
and XML_AttlistDeclHandler() gives the following output:
doctypeStart: line 1 column 44 index 44 count 1
elementDecl: line 2 column 18 index 64 count 0
attlistDecl: line 3 column 29 index 100 count 0
doctypeEnd: line 4 column 1 index 111 count 1
elementStart: line 5 column 0 index 113 count 20
If called in an elementStart handler, the
XML_GetCurrent* functions return sensible values. Line
5 column 0 is the opening "<" of that tag, as the
comment in expat.h says, and the complete markup
reported is 20 characters long. Very fine.
If called in the doctype declaration start handler,
element declaration handler or attlist declaration
handler, the results getting stranger. The position,
reported by the XML_GetCurrentLine/ColumnNumber is
somewhere inside the reported markup and the results of
XML_GetCurrentByteCount looks really somewhat wired. At
least, the result of XML_GetCurrentByteIndex points
always to the same char as XML_GetCurrentLine/ColumnNumber.
The current behavior seems to allow me, to do what I
want (preserve the internal subset as found in the
original XML data, with copying the parts of the input
streams as indicated by XML_GetCurrent* function calls
in the doctype declaration start/end handler) but all
in all, this behavior isn't really considered to be
stable or 'the right one' and for sure, it's not
documented, so that one could bank on it.
rolf
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=683681&group_id=10127
More information about the Expat-bugs
mailing list