convert html

jesso1607 at rogers.com jesso1607 at rogers.com
Thu Jul 8 10:34:22 EDT 2004


Hi:

I want to convert html to xml.

I am doing this:

from xml.dom.ext.reader import HtmlLib
from xml.dom import ext, Node
from xml.dom.NodeFilter import NodeFilter

def main( argv ):
     # build a DOM tree from the html
     reader = HtmlLib.Reader()
     dom_object = reader.fromUri( sys.argv[1] )

     info = getTableInfo( dom_object, 9 )

     reader.releaseNode( dom_object );

if __name__ == "__main__":
     main( sys.argv )

This takes almost a minute on a 6000 line html file on a PIII 700 Mhz 256 RAM.  This is too slow.

Can you suggest another way of doing this in Python?

     




More information about the Python-list mailing list