Using Beautiful Soup to entangle bookmarks.html

Francach uid09012_ti at martin-collins.de
Fri Sep 8 09:48:31 EDT 2006


Hi George,

Firefox lets you group the bookmarks along with other information into
directories and sub-directories. Firefox uses header tags for this
purpose. I'd like to get this grouping information out aswell.

Regards,
Martin.


the idea is to extract.
George Sakkis wrote:
> Francach wrote:
> > George Sakkis wrote:
> > > Francach wrote:
> > > > Hi,
> > > >
> > > > I'm trying to use the Beautiful Soup package to parse through the
> > > > "bookmarks.html" file which Firefox exports all your bookmarks into.
> > > > I've been struggling with the documentation trying to figure out how to
> > > > extract all the urls. Has anybody got a couple of longer examples using
> > > > Beautiful Soup I could play around with?
> > > >
> > > > Thanks,
> > > > Martin.
> > >
> > > from BeautifulSoup import BeautifulSoup
> > > urls = [tag['href'] for tag in
> > >         BeautifulSoup(open('bookmarks.html')).findAll('a')]
> > Hi,
> >
> > thanks for the helpful reply.
> > I wanted to do two things - learn to use Beautiful Soup and bring out
> > all the information
> > in the bookmarks file to import into another application. So I need to
> > be able to travel down the tree in the bookmarks file. bookmarks seems
> > to use header tags which can then contain a tags where the href
> > attributes are. What I don't understand is how to create objects which
> > can then be used to return the information in the next level of the
> > tree.
> >
> > Thanks again,
> > Martin.
>
> I'm not sure I understand what you want to do. Originally you asked to
> extract all urls and BeautifulSoup can do this for you in one line. Why
> do you care about intermediate objects or if the anchor tags are nested
> under header tags or not ? Read and embrace BeautifulSoup's philosophy:
> "You didn't write that awful page. You're just trying to get some data
> out of it. Right now, you don't really care what HTML is supposed to
> look like."
> 
> George




More information about the Python-list mailing list