beautifulSoup 4.1
Denis McMahon
denismfmcmahon at gmail.com
Fri Mar 20 03:41:24 EDT 2015
On Fri, 20 Mar 2015 00:18:33 -0700, Sayth Renshaw wrote:
> Just finding it odd that the next sibling is a "\n" and not the next
> <td> otherwise that would be the perfect solution.
Whitespace between elements creates a node in the parsed document. This
is correct, because whitespace between elements will be interpreted as
whitespace by a browser.
<a href="blah1">text1</a><a href="blah2">text2</a>
will be displayed differently to
<a href="blah1">text1</a> <a href="blah2">text2</a>
in a browser, because the space between the <a> two elements in the
second case is a text node in the dom.
A newline has the same effect (because to a browser for display purposes
it's just whitespace) but in the dom the text node will contain the
newline rather than a space.
bs4 tries to parse the html the same way a browser does, so you get all
the text nodes, including the whitespace between elements which includes
any newlines.
--
Denis McMahon, denismfmcmahon at gmail.com
More information about the Python-list
mailing list