beautiful soup library question
Erik Max Francis
max at alcyone.com
Fri Mar 10 18:39:30 EST 2006
meyerkp at gmail.com wrote:
> I'm trying to extract some information from an html file using
> beautiful soup. The strings I want get are after br tags, eg:
>
> <font size='6'>
> <br>this info
> <br>more info
> <br>and more info
> </font>
>
> I can navigate to the first br tag using find_next_sibling, but how do
> I get the string after the br's?
> br.contents is empty.
I'm not familiar with Beautiful Soup specifically, but this isn't how
the <br> tag works. Unlike a tag like <li> or <p>, which need not be
closed in HTML, <br> does not contain anything, it's just a line break.
If it were XHTML, it would be <br />, indicating that it's a
standalone tag.
Instead you want to traverse the contents of the font tag, taking into
account line breaks that you encounter.
--
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
Fear is an emotion indispensible for survival.
-- Hannah Arendt
More information about the Python-list
mailing list