Using Beautiful Soup to entangle bookmarks.html

George Sakkis george.sakkis at gmail.com
Fri Sep 8 08:55:05 EDT 2006


Francach wrote:
> George Sakkis wrote:
> > Francach wrote:
> > > Hi,
> > >
> > > I'm trying to use the Beautiful Soup package to parse through the
> > > "bookmarks.html" file which Firefox exports all your bookmarks into.
> > > I've been struggling with the documentation trying to figure out how to
> > > extract all the urls. Has anybody got a couple of longer examples using
> > > Beautiful Soup I could play around with?
> > >
> > > Thanks,
> > > Martin.
> >
> > from BeautifulSoup import BeautifulSoup
> > urls = [tag['href'] for tag in
> >         BeautifulSoup(open('bookmarks.html')).findAll('a')]
> Hi,
>
> thanks for the helpful reply.
> I wanted to do two things - learn to use Beautiful Soup and bring out
> all the information
> in the bookmarks file to import into another application. So I need to
> be able to travel down the tree in the bookmarks file. bookmarks seems
> to use header tags which can then contain a tags where the href
> attributes are. What I don't understand is how to create objects which
> can then be used to return the information in the next level of the
> tree.
>
> Thanks again,
> Martin.

I'm not sure I understand what you want to do. Originally you asked to
extract all urls and BeautifulSoup can do this for you in one line. Why
do you care about intermediate objects or if the anchor tags are nested
under header tags or not ? Read and embrace BeautifulSoup's philosophy:
"You didn't write that awful page. You're just trying to get some data
out of it. Right now, you don't really care what HTML is supposed to
look like."

George




More information about the Python-list mailing list