scraping nested tables with BeautifulSoup
Kent Johnson
kent at kentsjohnson.com
Tue Apr 4 12:09:50 EDT 2006
Gonzillaaa at gmail.com wrote:
> Hey Kent,
>
> thanks for your reply. how did you exactly save the file in firefox? if
> I save the file locally I get the same error.
The Firefox version, among other things, turns all the funky <!FOO> and
<!/FOO> tags into comments. Here is a way to do the same thing with BS:
import re
from urllib import urlopen
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup
# This tells BS to turn <!FOO> into <!-- FOO --> which allows it
# to do a better job parsing this data
fixExclRe = re.compile(r'<!(?!--)([^>]+)>')
BeautifulStoneSoup.PARSER_MASSAGE.append( (fixExclRe, r'<!-- \1 -->') )
data = urlopen('http://www.findaproperty.com/regi0018.html').read()
soup = BeautifulSoup(data)
priceGuide = soup('table', dict(bgcolor="e0f0f8", border="0",
cellpadding="2", cellspacing="2", width="150"))[1]
print priceGuide
Kent
More information about the Python-list
mailing list