Using Beautiful Soup to entangle bookmarks.html

Tim Williams tim at tdw.net
Thu Sep 7 18:31:18 EDT 2006


On 7 Sep 2006 14:30:25 -0700, Adam Jones <ajones1 at gmail.com> wrote:
>
> Francach wrote:
> > Hi,
> >
> > I'm trying to use the Beautiful Soup package to parse through the
> > "bookmarks.html" file which Firefox exports all your bookmarks into.
> > I've been struggling with the documentation trying to figure out how to
> > extract all the urls. Has anybody got a couple of longer examples using
> > Beautiful Soup I could play around with?
> >
> > Thanks,
> > Martin.
>
> If the only thing you want out of the document is the URL's why not
> search for: href="..." ? You could get a regular expression that
> matches that pretty easily. I think this should just about get you
> there, but my regular expressions have gotten very rusty.
>
> /href=\".+\"/
>

I doubt the bookmarks file is huge so something simple like

f = open('bookmarks.html').readlines()
data = [x for x in f if x.strip().startswith('<DT><A ')]

would get you started.

On my exported firefox bookmarks, this gives me all the urls,  they
just need to be parsed a bit more accurately,   I might be tempted to
just use a couple of splits() to keep it real simple.

HTH


-- 

Tim Williams



More information about the Python-list mailing list