Question: processing HTML, re-write default processing action of many tags

Michael Foord fuzzyman at gmail.com
Fri Sep 17 11:10:45 EDT 2004


hubert at cs.nyu.edu (Hubert Hung-Hsien Chang) wrote in message news:<98ba0902.0409162115.3e2e9ee9 at posting.google.com>...
> I know you could use the 
> 
> 
> def start_a
> ....
> 
> def end_a
> ....
> 
> to process the <a href=...> anchor </a> tags, but is there a 
> default method for processing ALL tags? If I just want change 
> some parts of the hyperlink and want to keep other parts of the HTML
> could I just print them out? There should be such a method.
> Can't find it...
> 
> Thank you.

If you are modifying the contents of tags I've written a simple HTML
parser class called Scraper that does this. Unlike the HTMLParser in
the standard library it doesn't choke so much on badly formed HTML....

It's part of approx.py my cgiproxy....
http://www.voidspace.org.uk/atlantibots/pythonutils.html#cgiproxy  

HTH

Regards,

Fuzzy



More information about the Python-list mailing list