[XML-SIG] sgmllib has problems with dots in tag names

Fred L. Drake fdrake at cnri.reston.va.us
Fri Jul 16 09:17:03 EDT 1999


Andreas Jung writes:
 > The SGML parsers from the standard sgmllib and the XML sgmllib war both
 > unable to parse SGML tags with dots in the tag name like <TI.DOC>. The 
 > parsers callback functions only get the first part of the tag name (before
 > the dot) as argument (in this case 'TI'). Because the tags are valid SGML
 > tags this is a bit annoying. Ok, one could get a workaround by replacing
 > all dots in tags with an underscore however that's not a clean solution :-)

Andreas,
  Ok, I've poked at the standard sgmllib a bit to see what the problem
is.  The parser is recognizing the start and end tags.  Once
recognized, it is looking for the handler methods start_*() / end_*()
or do_*().  Since there's a dot in the name, these methods are not
defined, and the unknown_*tag() methods are called instead of the
handle_*tag() methods.
  It should be easy to override the unknown_*tag() methods to use a
table-based dispatcher or performs some form or name mangling, then
passes known tags through to the handle_*tag() methods or whatever.
This seems to be the easiest way to deal with the situation in the
short term.
  If you have any suggestions for a better approach to take, I'd love
to hear it.  It may not be unreasonable to use a mechanism similar to
that used by xmllib (a table of registered handler methods).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives




More information about the Python-list mailing list