Parsing HTML - modify URLs
Fuzzyman
michael at foord.net
Wed Jul 7 16:24:04 EDT 2004
"Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.69.1089211879.5135.python-list at python.org>...
> Fuzzyman wrote:
> > I am trying to parse an HTML page an only modify URLs within tags -
> > e.g. inside IMG, A, SCRIPT, FRAME tags etc...
> >
> > I have built one that works fine using the HTMLParser.HTMLParser and
> > it works fine.... on good HTML. Having done a google it looks like
> > parsing dodgy HTML and having HTMLParser choke is a common theme.
>
> Haven't used it, but Beautiful Soup sounds like it fits the bill:
>
> http://www.crummy.com/software/BeautifulSoup/
It talks about 'walkin the parse tree'... which is a bit more magic
than I want... I just want to modify URLs in tags... which means I
mainly want to extract the HTML unchanged and also modify a few tags -
HTMLParser is quite good at this- but dies *horribly* at bad HTML... I
may have to try beautiful soup though :-)
Regards,
Fuzzy
http://www.voidspace.org.uk/atlantibots/pythonutils.html
>
>
> FuManChu
More information about the Python-list
mailing list