Question: processing HTML, re-write default processing action of many tags

Alex Martelli aleaxit at yahoo.com
Fri Sep 17 04:56:34 EDT 2004


Hubert Hung-Hsien Chang <hubert at cs.nyu.edu> wrote:

> I know you could use the 
> 
> 
> def start_a
> ....
> 
> def end_a
> ....
> 
> to process the <a href=...> anchor </a> tags, but is there a 
> default method for processing ALL tags? If I just want change 
> some parts of the hyperlink and want to keep other parts of the HTML
> could I just print them out? There should be such a method.
> Can't find it...

You could subclass HTMLParser.HTMLParser and override handle_starttag
and handle_endtag (also, if needed, handle_charref, handle_entityref,
and last but not least handle_data -- that's assuming that while you
only talk about processing _tags_ you may in fact also want to process
references and text nodes... possibly handle_comment, too, btw).


Alex



More information about the Python-list mailing list