Using Beautiful Soup to entangle bookmarks.html
Paul Boddie
paul at boddie.org.uk
Fri Sep 8 10:27:47 EDT 2006
Francach wrote:
>
> Firefox lets you group the bookmarks along with other information into
> directories and sub-directories. Firefox uses header tags for this
> purpose. I'd like to get this grouping information out aswell.
import libxml2dom # http://www.python.org/pypi/libxml2dom
d = libxml2dom.parse("bookmarks.html", html=1)
for node in d.xpath("html/body//dt/*[1]"):
if node.localName == "h3":
print "Section:", node.nodeValue
elif node.localName == "a":
print "Link:", node.getAttribute("href")
One exercise, using the above code as a starting point, would be to
reproduce the hierarchy exactly, rather than just showing the section
names and the links which follow them. Ultimately, you may be looking
for a way to just convert the HTML into a simple XML document or into
another hierarchical representation which excludes the HTML baggage and
details irrelevant to your problem.
Paul
More information about the Python-list
mailing list