Beautiful soup : why does "string" not give me the string?

Jeremiah Dodds jeremiah.dodds at gmail.com
Wed Apr 1 04:31:00 EDT 2009


On Wed, Apr 1, 2009 at 8:25 AM, Gabriel Rossetti <
gabriel.rossetti at arimaz.com> wrote:

> Hello everyone,
>
> I am using beautiful soup to parse some HTML and I came across something
> strange.
> Here is an illustration:
>
> >>> soup = BeautifulSoup(u'<div class="text">hello ça boume<br /></div')
> >>> soup
> <div class="text">hello ça boume<br /></div>
> >>> soup.find("div", "text")
> <div class="text">hello ça boume<br /></div>
> >>> soup.find("div", "text").string
> >>> soup.find("div", "text").next
> u'hello \xe7a boume'
>
> why does soup.find("div", "text").string not give me the string? Is it
> because there is a <br/>?


IIRC, yes it is, and there's not much you can do about it other than  use
.next.string or .contents[0]  or stripping out brs. See
http://www.crummy.com/software/BeautifulSoup/documentation.html ,
particularly the "Removing Elements" and "string" sections.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090401/dadb4a06/attachment-0001.html>


More information about the Python-list mailing list