Using Beautiful Soup to entangle bookmarks.html

Paul Boddie paul at boddie.org.uk
Fri Sep 8 10:27:47 EDT 2006


Francach wrote:
>
> Firefox lets you group the bookmarks along with other information into
> directories and sub-directories. Firefox uses header tags for this
> purpose. I'd like to get this grouping information out aswell.

import libxml2dom # http://www.python.org/pypi/libxml2dom
d = libxml2dom.parse("bookmarks.html", html=1)
for node in d.xpath("html/body//dt/*[1]"):
    if node.localName == "h3":
        print "Section:", node.nodeValue
    elif node.localName == "a":
        print "Link:", node.getAttribute("href")

One exercise, using the above code as a starting point, would be to
reproduce the hierarchy exactly, rather than just showing the section
names and the links which follow them. Ultimately, you may be looking
for a way to just convert the HTML into a simple XML document or into
another hierarchical representation which excludes the HTML baggage and
details irrelevant to your problem.

Paul




More information about the Python-list mailing list