beautiful soup library question

Erik Max Francis max at alcyone.com
Fri Mar 10 18:39:30 EST 2006


meyerkp at gmail.com wrote:

> I'm trying to extract some information from an html file using
> beautiful soup.  The strings I want get are after br tags, eg:
> 
> <font size='6'>
>     <br>this info
>     <br>more info
>     <br>and more info
> </font>
> 
> I can navigate to the first br tag using find_next_sibling, but how do
> I get the string after the br's?
> br.contents is empty.

I'm not familiar with Beautiful Soup specifically, but this isn't how 
the <br> tag works.  Unlike a tag like <li> or <p>, which need not be 
closed in HTML, <br> does not contain anything, it's just a line break. 
  If it were XHTML, it would be <br />, indicating that it's a 
standalone tag.

Instead you want to traverse the contents of the font tag, taking into 
account line breaks that you encounter.

-- 
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
   Fear is an emotion indispensible for survival.
   -- Hannah Arendt



More information about the Python-list mailing list