Parsing HTML - modify URLs

Robert Brewer fumanchu at amor.org
Wed Jul 7 10:47:02 EDT 2004


Fuzzyman wrote:
> I am trying to parse an HTML page an only modify URLs within tags -
> e.g. inside IMG, A, SCRIPT, FRAME tags etc...
> 
> I have built one that works fine using the HTMLParser.HTMLParser and
> it works fine.... on good HTML. Having done a google it looks like
> parsing dodgy HTML and having HTMLParser choke is a common theme.

Haven't used it, but Beautiful Soup sounds like it fits the bill:

http://www.crummy.com/software/BeautifulSoup/


FuManChu



More information about the Python-list mailing list