scraping nested tables with BeautifulSoup
Kent Johnson
kent at kentsjohnson.com
Tue Apr 4 11:57:53 EDT 2006
Gonzillaaa at gmail.com wrote:
> Hey Kent,
>
> thanks for your reply. how did you exactly save the file in firefox? if
> I save the file locally I get the same error.
I think I right-clicked on the page and chose "Save page as..."
Here is a program that shows where BS is choking. It finds the last leaf
node in the parse data by descending the last child of each node:
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
data = urlopen('http://www.findaproperty.com/regi0018.html').read()
soup = BeautifulSoup(data)
tag = soup
while hasattr(tag, 'contents') and tag.contents:
tag = tag.contents[-1]
print type(tag)
print tag
It prints:
<class 'BeautifulSoup.NavigableString'>
<!/BUTTONS>
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=2 WIDTH=100% BGCOLOR=F0F0F0>
<TD ALIGN=left VALIGN=top>
<snip lots more>
So for some reason BS thinks that everything from <!BUTTONS> to the end
is a single string.
Kent
More information about the Python-list
mailing list