Easy reading HTML?
Fredrik Lundh
effbot at telia.com
Tue Feb 22 18:58:33 EST 2000
Martin Skøtt <mskott at image.dk> wrote:
> I am currently in the thinking process of writing a little python
> program to sort my Netscape bookmarks file. It is so smart that this
> bookmark file is a simple HTML file which I am now looking for an easy
> way to read.
>
> What I need is a function to parse tables which are used to handle
> folders in the menu and <A HREF ...> tags. In the <A HREF..> I need to
> know the address it points to and its title (which is the one I want
> to sort on).
>
> Do you have any smart ideas you want to share? I guess its htmllib I
> need but I don't know where to start with it.
from the eff-bot guide (see below):
# htmllib-example-1.py
import htmllib
import formatter
import string
class Parser(htmllib.HTMLParser):
# return a dictionary mapping anchor texts to lists
# of associated hyperlinks
def __init__(self, verbose=0):
self.anchors = {}
f = formatter.NullFormatter()
htmllib.HTMLParser.__init__(self, f, verbose)
def anchor_bgn(self, href, name, type):
self.save_bgn()
self.anchor = href
def anchor_end(self):
text = string.strip(self.save_end())
if self.anchor and text:
self.anchors[text] = self.anchors.get(text, []) + [self.anchor]
file = open("samples/sample.htm")
html = file.read()
file.close()
p = Parser()
p.feed(html)
p.close()
for k, v in p.anchors.items():
print k, "=>", v
print
## link => ['http://www.python.org']
</F>
<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->
More information about the Python-list
mailing list