HTML DOM parser?

Gerhard Häring gerhard.haering at gmx.de
Thu Jul 18 15:57:05 EDT 2002


* Paul Rubin <phr-n2002b at NOSPAMnightsong.com> [2002-07-18 12:36 -0700]:
> Anyone know of a Python-callable HTML DOM parser?  I mean a serious
> one that tries to understand the crappy malformed out there in the
> real-world Web, the way a browser does. 

I see two options:
- use mxTidy (http://www.lemburg.com/files/python/mxTidy.html), then
  operate with a normal HTML parser on the output
- extract the parsing code from a real browser, like Mozilla or
  Konqueror. If it is win32 only, it might be possible to get to the DOM
  with interfacing Internet Exploder via COM, too

> If it can interpret Javascript that's even better.

You'll need a browser engine for that. Or use one of the other
Javascript engines and feed them your DOM.

Gerhard
-- 
mail:   gerhard <at> bigfoot <dot> de       registered Linux user #64239
web:    http://www.cs.fhm.edu/~ifw00065/    OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9  3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))





More information about the Python-list mailing list