[XML-SIG] xbel demo fails importing Netscape bookmark data

Fred Yankowski fred@ontosys.com
Tue, 12 Oct 1999 15:46:08 -0500


Greetings,

I'm getting reaquainted with the Python XML utilities after a long
absence.  My WinNT system has Python 1.5.2 and I installed the
PythonXML.exe package just today.

To test things out, I tried to use some of the xml/demo/xbel stuff.
Specifically, I tried to convert my Netscape bookmarks to XML/XBEL
format as follows

	python ns_parse.py 'f:\program files\netscape\users\me\bookmark.htm'

This resulted in an exception:

  File "ns_parse.py", line 86, in ?
    p.setDocumentHandler(ns_handler)
  File "J:\Program Files\Python\xml\sax\drivers\drv_sgmlop.py", line
    29, in setDocumentHandler
    self.parser.register(DHWrapper(dh), 1)
  TypeError: function requires exactly 1 argument; 2 given

So I hacked drv_sgmlop.py to remove that second (1) actual parameter.
Now the program runs to completion, but I get an essentially empty
XBEL document as output:

	<?xml version="1.0"?>
	<!DOCTYPE xbel PUBLIC "+//IDN python.org//DTD XML Bookmark Exchange
	Language 1.0//EN//XML" "xbel.dtd">
	<xbel>  <desc>No description</desc>
	</xbel>

I enabled some debug prints in the startElement method, which showed
that that method is *never* being called for my document.  I hacked
ns_parse.py to call get_parser_name(), which reported "sgmlop" -- no
big surprise given the above exception.  Noodling around in the
xml-sig archive for hints, I was inspired to try the pyexpat parser
instead of sgmlop, which I enabled by calling make_parser this way:

    p=saxexts.SGMLParserFactory.make_parser("xml.sax.drivers.drv_pyexpat")

With that change, I still get an empty XBEL document, but the
startElement method *is* being called a few times.  I hacked
ns_parse.py again to add warning(), error(), and fatalError() methods
and called setErrorHandler() to register them.  Now I see a fatal
error after the first element in the bookmark.htm file.

	xml.sax.saxlib.SAXParseException: junk after document element at :6:0

I think it's unhappy that the bookmark.htm file is not a well-formed
document.  It's a sequence of elements with no enclosing element.  I
added an enclosing HTML element to bookmarks.htm but now the parser
fails a bit later on, saying

	xml.sax.saxlib.SAXParseException: not well-formed at :6:75

(BTW, WTH does that ":6:75" indicate?  Based on the startElement
trace, it got confused on line 10 of the file.)  I think it's unhappy
with an H3 element that looks like this

	<H3 FOLDED ADD_DATE="920063741">Whatever</h3>

Is that valueless 'FOLDED' attribute the problem?

================
So, these tests make the XML package look alpha quality at best.  Am I
trying to use code that is known to be broken?  Is this problem
limited to the xbel stuff?  It sure looks like the SAX driver for
sgmlop is broken.  What part of the XML distribution is thought to be
stable?

-- 
Fred Yankowski    fred@OntoSys.com