Parsing HTML - modify URLs

Fuzzyman michael at foord.net
Wed Jul 7 16:24:04 EDT 2004


"Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.69.1089211879.5135.python-list at python.org>...
> Fuzzyman wrote:
> > I am trying to parse an HTML page an only modify URLs within tags -
> > e.g. inside IMG, A, SCRIPT, FRAME tags etc...
> > 
> > I have built one that works fine using the HTMLParser.HTMLParser and
> > it works fine.... on good HTML. Having done a google it looks like
> > parsing dodgy HTML and having HTMLParser choke is a common theme.
> 
> Haven't used it, but Beautiful Soup sounds like it fits the bill:
> 
> http://www.crummy.com/software/BeautifulSoup/

It talks about 'walkin the parse tree'... which is a bit more magic
than I want... I just want to modify URLs in tags... which means I
mainly want to extract the HTML unchanged and also modify a few tags -
HTMLParser is quite good at this- but dies *horribly* at bad HTML... I
may have to try beautiful soup though :-)

Regards,



Fuzzy

http://www.voidspace.org.uk/atlantibots/pythonutils.html

> 
> 
> FuManChu



More information about the Python-list mailing list