replacing tags

Fredrik Lundh fredrik at pythonware.com
Sun Jun 8 06:14:23 EDT 2003


Andrei <see at my.signature.com> wrote:

> I'm working on an RSS aggregator and I'd like to replace all img-tags in
> a piece of html with links to the image, thereby using the alt-text of
> the img as link text (if present). The rest of the html, including tags,
> should stay as-is. I'm capable of doing this in what feels like the dumb
> way (parsing it with regexes for example, or plain old string splitting
> and rejoining), but I have this impression the HTMLParser or htmllib
> module should be able to help me with this task.
>
> However, I can't figure out how (if?) I can make a parser do this. Does
> the formatter module fit in here somewhere? The docs, the effbot's guide
> and the posts regarding html only seem to highlight getting data out of
> the html (retrieving links seems particularly popular), not replacing
> tags with other ones.

the term "parser" usually refers to a piece of software that reads a
character stream, and turns it into some other data structure.

if you want to modify a character stream, you have to combine the
parser with code that turns that data structure back to a character
stream.

the "Using the sgmllib Module to Filter SGML Documents" example in
chapter 5 of my "Python Standard Library" book does exactly that:

    http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html
    http://www.effbot.org/zone/librarybook-index.htm (pdf)

(you can use a similar approach with HTMLParser, but htmllib is
designed for HTML formatting, not HTML parsing, and is not the
right tool for the task)

</F>








More information about the Python-list mailing list