HTML DOM parser?

Thu Jul 18 20:08:37 EDT 2002

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Peter Hansen
> Sent: Thursday, July 18, 2002 15:13
> To: python-list at python.org
> Subject: Re: HTML DOM parser?
>
>
> Paul Rubin wrote:
> >
> > Anyone know of a Python-callable HTML DOM parser?  I mean a serious
> > one that tries to understand the crappy malformed out there in the
> > real-world Web, the way a browser does.  If it can interpret
> > Javascript that's even better.  This is for a consulting client, so a
> > commercial library would be acceptable (though not preferred).
>
> How about automating IE using Python?
>
> from win32com.client import DispatchEx
>
> ie = DispatchEx('internetexplorer.application')
> ie.visible = 1
> ie.navigate('http://www.nightsong.com')
> dom = ie.document
>
> etc...
>
> Access to the DOM tree of the document might be too slow for your
> needs, but if it's not, you definitely get a lot of bang for the buck...
>
> -Peter

I put the above code into "ienavigate.py" and tried it and got:

Traceback (most recent call last):
  File "ienavigate.py", line 6, in ?
    dom = ie.document
  File "J:\Python22\lib\site-packages\win32com\client\dynamic.py", line 448,
in __getattr__
    raise pythoncom.com_error, details
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None, None,
None, 0, -2147467259), None)

Also got a browser window with a "403" error telling me I don't have
permission to access index.html on www.nightsong.com.

I would be interested in getting this working, so any help is appreciated.

TIA,

Dave LeBlanc
Seattle, WA USA