HTMLparser
Asle Pedersen
apederse at siving.hia.no
Fri Jan 21 08:02:13 EST 2000
I'm a beginner Python user and experimenting with the HTMLparser. I want to
convert all relative urls to absolute urls without thouching the rest of the
file containment.
this is what I have so far. but somehow it is throwing away all tags but the
anchor tags. (which is not what I want)
class minparser(htmllib.HTMLParser):
def __init__(self, formatter, verbose=0):
htmllib.HTMLParser.__init__(self, formatter, verbose)
def anchor_bgn(self, href, name, type):
self.anchor = urlparse.urljoin("baseurl",href)
if self.anchor:
self.save_bgn()
def anchor_end(self):
if self.anchor:
text = self.save_end()
#need to do something here
self.handle_data("%s <%s>"%(text,self.anchor))
self.anchor = None
-Asle
More information about the Python-list
mailing list