Html: replacing tags

Andrei see at my.signature.com
Sun Jun 8 12:29:31 EDT 2003


Originally posted by Fredrik Lundh 
> Andrei  wrote:
>
> > I'm working on an RSS aggregator and I'd like to replace all
>     img-tags in
> > a piece of html with links to the image, thereby using the
>     alt-text of
> > the img as link text (if present). The rest of the html,
>     including tags,
> > should stay as-is. I'm capable of doing this in what feels like
>     the dumb
> > way (parsing it with regexes for example, or plain old string
>     splitting
> > and rejoining), but I have this impression the HTMLParser or
>     htmllib
> > module should be able to help me with this task.
> > However, I can't figure out how (if?) I can make a parser do
>     this. Does
> > the formatter module fit in here somewhere? The docs, the
>     effbot's guide
> > and the posts regarding html only seem to highlight getting data
>     out of
> > the html (retrieving links seems particularly popular), not
>     replacing
> > tags with other ones.
>
> the term "parser" usually refers to a piece of software that reads a
> character stream, and turns it into some other data structure.
>
> if you want to modify a character stream, you have to combine the
> parser with code that turns that data structure back to a character
> stream.
>
> the "Using the sgmllib Module to Filter SGML Documents" example in
> chapter 5 of my "Python Standard Library" book does exactly that:
>
>     http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html"]-
>     http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html[/url]
>     http://www.effbot.org/zone/librarybook-index.htm"]http://ww-
>     w.effbot.org/zone/librarybook-index.htm[/url] (pdf)
>
> (you can use a similar approach with HTMLParser, but htmllib is
> designed for HTML formatting, not HTML parsing, and is not the
> right tool for the task) 

Thanks, I'll look into sgmllib then (I already have that chapter on my
HD :), but it didn't really occur to me to look at sgml).

--
Contact info (decode with rot13): cebwrpg5 at bcrenznvy.pbz
Fcnzserr! Cyrnfr qb abg hfr va choyvp zrffntrf. V ernq gur yvfg, ab arrq gb PP.


Posted via http://dbforums.com




More information about the Python-list mailing list