re or html parser module, for wildcard search within html document?

John J. Lee jjl at pobox.com
Sat Aug 2 21:04:44 EDT 2003


mm2ps at yahoo.co.uk (Douglas) writes:
[...]
> Specifically, I want to replace any tag containing the word "font"
> with a new tag. As I want to use some form of wild card for the
> search, eg. <*font*>, should I use a regular expression module (re) or
> one of the specific html parsers? If this should be done with an html

Sounds like you could use HTMLParser.HTMLParser (rather than sgmllib
or htmllib).  A regexp might be simpler, though, if it gets the job
done reliably & maintainably enough in practice.


> parser module then which one and where is some easy going introductory
> documentation, please?

Moshe Zadka had some useful powerpoint slides which are probably still
on the web somewhere.  A bit out of date now, no doubt.


John




More information about the Python-list mailing list